Skip to content

Fix OverflowError when loading a large term-document matrix in MatrixMarket format. Fix #1998#2001

Merged
menshikh-iv merged 4 commits into
piskvorky:developfrom
arlenk:fix-1998
Mar 28, 2018
Merged

Fix OverflowError when loading a large term-document matrix in MatrixMarket format. Fix #1998#2001
menshikh-iv merged 4 commits into
piskvorky:developfrom
arlenk:fix-1998

Conversation

@arlenk

@arlenk arlenk commented Mar 27, 2018

Copy link
Copy Markdown
Contributor

cython mmreader was (incorrectly) assuming that num_docs, num_terms, etc. would fit in a c int.

This version uses long longs for all related types.

Related issue from @KartikTaskhuman: #1998

@menshikh-iv menshikh-iv changed the title Bug-fix #1998: OverflowError when loading a large term-document matrix in Matrix Market format Fix OverflowError when loading a large term-document matrix in MatrixMarket format. Fix #1998 Mar 27, 2018
@menshikh-iv

menshikh-iv commented Mar 27, 2018

Copy link
Copy Markdown
Contributor

@arlenk please have a look to error - https://ci.appveyor.com/project/piskvorky/gensim-431bq/build/job/87sjiebimbr726ya#L306 (looks like a problem with python2 and L postfix for long long values)

Descriptions of reason (python2)

>>> isinstance(2, int)
True
>>> isinstance(2L, int)
False

@arlenk

arlenk commented Mar 28, 2018

Copy link
Copy Markdown
Contributor Author

@menshikh-iv

@arlenk please have a look to error - https://ci.appveyor.com/project/piskvorky/gensim-431bq/build/job/87sjiebimbr726ya#L306 (looks like a problem with python2 and L postfix for long long values)

unfortunately I can't seem to replicate this on osx, but I think you are right, the problem is that python 2 still treats ints and longs differently. I updated the tests to use isinstance(val, numbers.Integral) instead of isinstance(val, int). It seems like most other tests were already using number.Integral instead of int already.

Hopefully this will fix the issue on the windows build.

@menshikh-iv

Copy link
Copy Markdown
Contributor

Thanks for fast fix @arlenk 👍

@menshikh-iv menshikh-iv merged commit 10a3dab into piskvorky:develop Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants