Combine indices and check them both intelligently by glesica · Pull Request #75 · code-for-montana/nonprofit-data

glesica · 2020-02-02T19:12:51Z

Previously we were ignoring the BMF index but we need that for indexed location data, otherwise we'd have to download every single filing document. This PR refactors to integrate both the Annual and BMF indices into a single Index implementation. Filters can be added to the index and will operate on the correct underlying index data.

This PR also makes some quality of life improvements.

Resolves #69

smai-f · 2020-02-03T03:00:57Z

pyrs990/pyrs990/_options.py

 # -------------------- #

-for indexFieldName in IndexRecord._fields:
+for index_filter_field_name in IndexRecord.field_names():


This is confusing to me again. Is an "index filter" a filter on an "annual record" and a "filing filter" is a filter on a "BMF record" or are they different? If they are the same, can we rename these to match so we use "annual" and "BMF" consistently everywhere?

"index filter" is a filter that applies to either the annual index or the bmf index, but it can use either an "annual record" or a "bmf record", respectively, to access the data it needs to use for filtering, or it can even use both at once... a "filing filter" is totally different, it applies to the xml data once it has been downloaded and relies only on the "filing record"

smai-f · 2020-02-03T03:07:33Z

pyrs990/pyrs990/index.py

+        if self._length is not None:
+            return self._length
+
+        # TODO: This is stupid, don't instantiate the tuples, just count


.......What tuples?

The records themselves, we don't need to create them to count how many there are, we can just count how many things that would normally be turned into records there are. Like if you want to know how many bread sandwiches you can make you can just count the slices of break, you don't have to actually make sandwiches and then count them.

smai-f · 2020-02-03T03:16:51Z

pyrs990/pyrs990/annual_record.py

+        return self.__str__()
+
+    def __str__(self):
+        return f"AnnualRecord(taxpayer_name='{self.taxpayer_name}', ein='{self.ein}')"


Wanna add the rest of the things to this?

I kept it short so it would be easy to read in a REPL. In theory the __repr__ version is supposed to use correct Python syntax to instantiate a new instance of the thing, if possible, so maybe we should add the rest of the fields to that... but, on the other hand, I think the REPL uses __repr__ by default, so then that defeats the point. I'm not sure how I feel.

Combine indices and check them both intelligently

fefa9ca

glesica added enhancement New feature or request extractor Stuff related to the ETL bits labels Feb 2, 2020

glesica requested a review from smai-f February 2, 2020 19:12

glesica added 4 commits February 2, 2020 12:36

Spruce up import

62d0e6c

Move setup.py config forward

ab78076

Fix bugs in filters and spruce up setup

d5c8260

Log when we skip an EIN

cf7e0a7

smai-f reviewed Feb 3, 2020

View reviewed changes

glesica added 2 commits February 3, 2020 18:03

Fix pipfile

46a7b6e

Spruce up tests and CLI

7c9811c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine indices and check them both intelligently#75

Combine indices and check them both intelligently#75
glesica wants to merge 7 commits intomasterfrom
combine-index

glesica commented Feb 2, 2020

Uh oh!

smai-f Feb 3, 2020

Uh oh!

glesica Feb 3, 2020

Uh oh!

smai-f Feb 3, 2020

Uh oh!

glesica Feb 3, 2020

Uh oh!

smai-f Feb 3, 2020

Uh oh!

glesica Feb 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

glesica commented Feb 2, 2020

Uh oh!

smai-f Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

glesica Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

smai-f Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

glesica Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

smai-f Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

glesica Feb 3, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants