Fix the comment of Translation Matrix by robotcator · Pull Request #1594 · piskvorky/gensim

robotcator · 2017-09-19T14:34:20Z

No description provided.

…into mydevelop

menshikh-iv · 2017-09-20T05:15:02Z

        Args:
-            `word_pair` (list): a list pair of words
+            `word_pairs` (list): a list pair of words
            `source_space` (Space object): source language space


train method use only word_pairs, what is source/target space here?

menshikh-iv · 2017-09-20T05:15:53Z

-        self.source_space = Space.build(self.source_lang_vec, set(self.source_word))
-        self.target_space = Space.build(self.target_lang_vec, set(self.target_word))
+        self.source_word, self.target_word = zip(*word_pairs)
+        if self.translation_matrix is None:


But if I called train twice, in the second time I don't fit model.
Please remove this if

menshikh-iv · 2017-09-20T05:29:48Z


-        with utils.smart_open(self.train_file, "r") as f:
-            self.word_pair = [tuple(utils.to_unicode(line).strip().split()) for line in f]
+        self.word_pairs = [("one", "uno"), ("two", "due"), ("three", "tre"),


Please use hanging indents

menshikh-iv · 2017-09-20T05:31:00Z

+                           ("grape", "acino"), ("banana", "banana"), ("mango", "mango")
+        ]
+
+        self.test_word_pairs = [("ten", "dieci"), ("dog", "cane"), ("cat", "gatto")]


Remove ("dog", "cane") from self.word_pairs

gojomo · 2017-09-20T22:38:34Z


    def normalize(self):
-        """ normalized the word vector's matrix """
+        """ Normalized the word vector's matrix """


'Normalize…' (imperative rather than past-tense)

gojomo · 2017-09-21T00:51:39Z

    def apply_transmat(self, words_space):
        """
-        mapping the source word vector to the target word vector using translation matrix
+        Mapping the source word vector to the target word vector using translation matrix


'Map…' (imperative rather than '-ing' form)

gojomo · 2017-09-21T01:08:20Z

The handling of word_pairs in __init__() and train() now makes sense, thanks. The comments have been improved but still may benefit from a deep review for clarity/wording.

Though I know I requested the Doc2Vec-related example, in its current form the motivations/benefits are muddled. Really it shouldn't require a separate helper class (BackMappingTranslationMatrix), and the notebook section ("Tranlation Matrix Revisit") is hard-to-follow, and includes a bunch of improper practices. (For example: using an imbalanced set of docs for the mapping 'overlap'; using the slow-and-iffy dm_concat mode; calling train() multiple times with a sawtooth alpha progression, etc.)

The word-translation example can presumably be evaluated based on real datasets in the original context that motivated the approach, while the doc-vec example will need more novel design/evaluation – so I'd recommend splitting them into separate notebooks.

robotcator · 2017-09-21T02:38:59Z

Thanks. You do remind me the imbalanced set problem in the example. And the code for training document vector are borrowed from the doc2vec-imdb.ipynb and I will re-train the document vector.
As for the imbalanced data, how to sample documents to 'overlap' according to the sentiment or whether it is in the train or test set. (according to the sentiment to sample is more logical)

For the BackMappingTranslationMatrix class, I didn't find a good way to integrate this function into my TranslationMatrix class, so I separate this into two class. Because the word2vec and doc2vec has different method to access the vector.

for word2vec, use model[word] to get the word vector.
for doc2vec, use model.doc2vec['doc_tag'] to get the document vector.

If BackMappingTranslationMatrix was integrated into TranslationMatrix, I would handle them separately according to the type (ininstance method), is it appropriate?

I didn't catch that The word-translation example can be evaluated based on real datasets in the original context , can you please explain in more detail?

gojomo · 2017-09-21T19:20:01Z

I meant there are published papers about using word-vector transformations for language-translation - the original Google paper, the Dinu paper – so there are specific datasets & procedures to mimic – and similar results would indicate everything is working. The Doc2Vec use is novel so requires more experimentation/thought.

robotcator · 2017-09-22T01:47:16Z

The word pairs used in this experiment are extracted from the OPUS(http://opus.lingfil.uu.se/). The same as the Ninu's paper. I plot the vis to show the linear relationship between two language vector space and use the word translation to show this transformation works. More re-produced experiments from the Mikolov's and Ninu's paper would be fine to support this transformation. But I still can not find any experiment for language translation(Do it mean sentences translation ) from the two paper I mentioned here? Can you remind me If I miss something?

I added "unstable/experimental" warning tag in notebook explicitly for doc2vec transformation part as Ivan suggested.

I also found a paper (OFFLINE BILINGUAL WORD VECTORS,ORTHOGONALTRANSFORMATIONS AND THE INVERTED SOFTMAX) is related to this experiment. But I‘m just having a look and need dig deeper.

menshikh-iv · 2017-09-25T07:07:32Z

Thank you @robotcator for fast fixes.
Thank you @gojomo for a review:+1:

menshikh-iv · 2017-09-25T07:17:08Z

Need to fix some typos/pep8 issues in a notebook, but I can't wait for more, it's release time.

robotcator added 2 commits September 19, 2017 22:28

fix the comments

21841ad

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

cfc750b

…into mydevelop

robotcator changed the title ~~Fix the comment of Tr~~ Fix the comment of Translation Matrix Sep 19, 2017

robotcator mentioned this pull request Sep 19, 2017

[MRG] Implement 'Translation Matrix' #1434

Merged

remove print function

54ce6ab

menshikh-iv suggested changes Sep 20, 2017

View reviewed changes

robotcator added 3 commits September 20, 2017 14:29

update the notebook

be73aac

fix the train method

8d22786

remove some words for sample

dc9418e

gojomo reviewed Sep 20, 2017

View reviewed changes

gojomo reviewed Sep 21, 2017

View reviewed changes

fix the tense

97b32c2

add warning for the translation matrix revist part

423ca98

menshikh-iv added the style checking label Sep 25, 2017

menshikh-iv merged commit 33a3ef2 into piskvorky:develop Sep 25, 2017

robotcator deleted the mydevelop branch September 25, 2017 12:09

menshikh-iv removed the style checking label Oct 6, 2017

robotcator mentioned this pull request Nov 17, 2017

Fixing the comment in Implement 'Translation Matrix' #1593

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix the comment of Translation Matrix#1594

Fix the comment of Translation Matrix#1594
menshikh-iv merged 8 commits into
piskvorky:developfrom
robotcator:mydevelop

robotcator commented Sep 19, 2017

Uh oh!

menshikh-iv Sep 20, 2017

Uh oh!

menshikh-iv Sep 20, 2017

Uh oh!

menshikh-iv Sep 20, 2017

Uh oh!

menshikh-iv Sep 20, 2017

Uh oh!

gojomo Sep 20, 2017

Uh oh!

gojomo Sep 21, 2017

Uh oh!

gojomo commented Sep 21, 2017

Uh oh!

robotcator commented Sep 21, 2017 •

edited

Loading

Uh oh!

gojomo commented Sep 21, 2017

Uh oh!

robotcator commented Sep 22, 2017 •

edited

Loading

Uh oh!

menshikh-iv commented Sep 25, 2017

Uh oh!

menshikh-iv commented Sep 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

robotcator commented Sep 19, 2017

Uh oh!

menshikh-iv Sep 20, 2017

Choose a reason for hiding this comment

Uh oh!

menshikh-iv Sep 20, 2017

Choose a reason for hiding this comment

Uh oh!

menshikh-iv Sep 20, 2017

Choose a reason for hiding this comment

Uh oh!

menshikh-iv Sep 20, 2017

Choose a reason for hiding this comment

Uh oh!

gojomo Sep 20, 2017

Choose a reason for hiding this comment

Uh oh!

gojomo Sep 21, 2017

Choose a reason for hiding this comment

Uh oh!

gojomo commented Sep 21, 2017

Uh oh!

robotcator commented Sep 21, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gojomo commented Sep 21, 2017

Uh oh!

robotcator commented Sep 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

menshikh-iv commented Sep 25, 2017

Uh oh!

menshikh-iv commented Sep 25, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robotcator commented Sep 21, 2017 •

edited

Loading

robotcator commented Sep 22, 2017 •

edited

Loading