Skip to content

Combine indices and check them both intelligently#75

Open
glesica wants to merge 7 commits intomasterfrom
combine-index
Open

Combine indices and check them both intelligently#75
glesica wants to merge 7 commits intomasterfrom
combine-index

Conversation

@glesica
Copy link
Contributor

@glesica glesica commented Feb 2, 2020

Previously we were ignoring the BMF index but we need that for indexed location data, otherwise we'd have to download every single filing document. This PR refactors to integrate both the Annual and BMF indices into a single Index implementation. Filters can be added to the index and will operate on the correct underlying index data.

This PR also makes some quality of life improvements.

Resolves #69

@glesica glesica added enhancement New feature or request extractor Stuff related to the ETL bits labels Feb 2, 2020
@glesica glesica requested a review from smai-f February 2, 2020 19:12
# -------------------- #

for indexFieldName in IndexRecord._fields:
for index_filter_field_name in IndexRecord.field_names():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is confusing to me again. Is an "index filter" a filter on an "annual record" and a "filing filter" is a filter on a "BMF record" or are they different? If they are the same, can we rename these to match so we use "annual" and "BMF" consistently everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"index filter" is a filter that applies to either the annual index or the bmf index, but it can use either an "annual record" or a "bmf record", respectively, to access the data it needs to use for filtering, or it can even use both at once... a "filing filter" is totally different, it applies to the xml data once it has been downloaded and relies only on the "filing record"

if self._length is not None:
return self._length

# TODO: This is stupid, don't instantiate the tuples, just count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.......What tuples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The records themselves, we don't need to create them to count how many there are, we can just count how many things that would normally be turned into records there are. Like if you want to know how many bread sandwiches you can make you can just count the slices of break, you don't have to actually make sandwiches and then count them.

return self.__str__()

def __str__(self):
return f"AnnualRecord(taxpayer_name='{self.taxpayer_name}', ein='{self.ein}')"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanna add the rest of the things to this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it short so it would be easy to read in a REPL. In theory the __repr__ version is supposed to use correct Python syntax to instantiate a new instance of the thing, if possible, so maybe we should add the rest of the fields to that... but, on the other hand, I think the REPL uses __repr__ by default, so then that defeats the point. I'm not sure how I feel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request extractor Stuff related to the ETL bits

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Combine Index implementations

2 participants