Skip to content

Update dtmmodel.py#806

Merged
tmylk merged 6 commits into
piskvorky:developfrom
Eickho:patch-1
Aug 11, 2016
Merged

Update dtmmodel.py#806
tmylk merged 6 commits into
piskvorky:developfrom
Eickho:patch-1

Conversation

@Eickho
Copy link
Copy Markdown
Contributor

@Eickho Eickho commented Jul 25, 2016

Adds a check if any documents in the supplied corpus are empty, which breaks the DIM model without providing a usable error description to the user. It should be noted that this only seems to cause errors with the DIM model and not the DTM model. However, I don't think there is any upside to having empty documents in the DTM anyway?

Adds a check if any documents in the supplied corpus are empty, which breaks the DIM model without providing a usable error description to the user.
@bhargavvader
Copy link
Copy Markdown
Contributor

Yeah, this is a documented problem with the DIM code. This should be useful.

@piskvorky piskvorky added the bug Issue described a bug label Jul 27, 2016
Comment thread gensim/models/wrappers/dtmmodel.py Outdated
lencorpus = sum(1 for _ in corpus)
if lencorpus == 0:
raise ValueError("cannot compute DTM over an empty corpus")
if any([i == 0 for i in [len(text) for text in corpus.get_texts()]]):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be and model=='fixed' as well?

@tmylk
Copy link
Copy Markdown
Contributor

tmylk commented Aug 5, 2016

@Eickho Thanks for the PR. Please add a changelog and a check to only affect model=='fixed'

@bhargavvader
Copy link
Copy Markdown
Contributor

@Eickho will you be taking this up?

Adds a check for empty (not a single word) documents in the corpus supplied to the DTM implementation only if the DIM mode (model = "fixed") is used.
Comment thread gensim/models/wrappers/dtmmodel.py Outdated
lencorpus = sum(1 for _ in corpus)
if lencorpus == 0:
raise ValueError("cannot compute DTM over an empty corpus")
if any([i == 0 for i in [len(text) for text in corpus.get_texts()]]) and model == "fixed:
Copy link
Copy Markdown
Contributor

@bhargavvader bhargavvader Aug 10, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Eickho, Wouldn't it be better to do if model == "fixed" and any([i == 0 for i in [len(text) for text in corpus.get_texts()]]):
so that it won't bother checking the condition if it isn't in fixed mode?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that would be better, updating.

Eickho added 4 commits August 10, 2016 11:38
Inversed conditions as proposed by @bhargavvader
Added description of check for empty (no words) documents in the DIM mode of the DTM wrapper.
Added issue no.
@tmylk tmylk merged commit d234b3c into piskvorky:develop Aug 11, 2016
lencorpus = sum(1 for _ in corpus)
if lencorpus == 0:
raise ValueError("cannot compute DTM over an empty corpus")
if model == "fixed" and any([i == 0 for i in [len(text) for text in corpus.get_texts()]]):
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace with any(not text for text in corpus.get_texts()) (more Pythonic). This looks unnecessarily complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Issue described a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants