Skip to content

Tests for the evaluate_word_pairs function#1061

Merged
tmylk merged 62 commits into
piskvorky:developfrom
akutuzov:develop
Dec 28, 2016
Merged

Tests for the evaluate_word_pairs function#1061
tmylk merged 62 commits into
piskvorky:developfrom
akutuzov:develop

Conversation

@akutuzov
Copy link
Copy Markdown
Contributor

Test for evaluating model against semantic similarity datasets (#1047).
Also fixes an error in the function call.

tmylk and others added 30 commits November 5, 2015 19:07
Conflicts:
	CHANGELOG.txt
	gensim/models/word2vec.py
@akutuzov
Copy link
Copy Markdown
Contributor Author

@tmylk the tests are ready.

Copy link
Copy Markdown
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests. An oov_ratio sanity test would be great

pearson = correlation[0][0]
spearman = correlation[1][0]
self.assertTrue(0.1 < pearson < 1.0)
self.assertTrue(0.1 < spearman < 1.0)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we please test for oov_ratio in correlation[2] too?

@akutuzov
Copy link
Copy Markdown
Contributor Author

Sure, done.

@tmylk tmylk merged commit 88d032b into piskvorky:develop Dec 28, 2016
@tmylk
Copy link
Copy Markdown
Contributor

tmylk commented Dec 28, 2016

Thanks for the improvement!

@tmylk
Copy link
Copy Markdown
Contributor

tmylk commented Dec 30, 2016

By the way, how is it better than using https://github.com/mfaruqui/eval-word-vectors ?
@anmol01gulati what code did you use to convert gensim word2vec to that format? A short script for that would be useful

@akutuzov
Copy link
Copy Markdown
Contributor Author

It's better in that this code works directly from Gensim :)
In fact, my code is simpler as it uses Scipy functions for Pearson and Spearman coefficients (eval-word-vectors implements Spearman from scratch). Also, it features some useful options, like case-(in)sensitivity and smart handling of OOV pairs.

jayantj pushed a commit to jayantj/gensim that referenced this pull request Jan 4, 2017
@anmolgulati
Copy link
Copy Markdown
Contributor

I agree with @akutuzov. The code currently in gensim for Pearson and Spearman coefficients is shorter. But I feel, we could also include the whole dataset for evaluating word vectors, given in https://github.com/mfaruqui/eval-word-vectors. It's just 205 KB, and contains all the major gold standards, it'd be good to integrate them into gensim itself, and have one method to directly evaluate word2vec models, right inside gensim. What do you think?

The script I used to convert word2vec into the format for evaluating word vectors is quite small actually:

import gensim

model = gensim.models.Word2Vec.load_word2vec_format(
    'GoogleNews-vectors-negative300.bin', binary=True)

words = [line.split()[0] for line in open(
    "eval-word-vectors/vocab.txt", 'r')]

with open('output_vecs.txt', 'wb') as f:
    for word in words:
        if word in model:
            word_vector = model[word]
            f.write("%s " % word)
            f.write(" ".join(str(x) for x in word_vector))
            f.write("\n")

@akutuzov
Copy link
Copy Markdown
Contributor Author

akutuzov commented Jan 7, 2017

I am not sure it's a good idea to overload Gensim with various semantic similarity datasets included in the distribution.
Most people would use their own gold datasets anyway, either because they deal with non-English data or because their text preprocessing differs from the preprocessing in SimLex999 or WS353 (lemmatization/stemming, POS-tagging, etc).
So I think it's better to leave WS353 as an example (and for testing), and may be put a couple of links to other datasets in the documentation.

@anmolgulati
Copy link
Copy Markdown
Contributor

Yeah you are right. Sounds Good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants