Skip to content

Accesing vector model vocabulary broken in Gensim 3.3 when loading from word2vec format #1882

@akutuzov

Description

@akutuzov

After upgrading to 3.3.0, it is now impossible to get the model's vocabulary with model.wv.vocab method, if the model is loaded from a text or binary word2vec file. However, it works for models saved in the Gensim native format.
I suppose it is related to re-designing vector models implementations in #1777. Anyway, it is not good to break compatibility in this way, without even notifying users.

Steps/ to Reproduce

import gensim, logging
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
model = gensim.models.KeyedVectors.load_word2vec_format('ANY_MODEL.bin.gz', binary=True)
WORD in model.wv.vocab

Expected Results

True or False, as it is in Gensim 3.2

Actual Results

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Word2VecKeyedVectors' object has no attribute 'wv'

Versions

Linux-4.13.0-32-generic-x86_64-with-LinuxMint-18.2-sonya
Python 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609]
NumPy 1.14.0
SciPy 1.0.0
gensim 3.3.0
FAST_VERSION 1

Metadata

Metadata

Assignees

Labels

bugIssue described a bugdifficulty easyEasy issue: required small fix

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions