Fix the comment of Translation Matrix#1594
Conversation
| Args: | ||
| `word_pair` (list): a list pair of words | ||
| `word_pairs` (list): a list pair of words | ||
| `source_space` (Space object): source language space |
There was a problem hiding this comment.
train method use only word_pairs, what is source/target space here?
| self.source_space = Space.build(self.source_lang_vec, set(self.source_word)) | ||
| self.target_space = Space.build(self.target_lang_vec, set(self.target_word)) | ||
| self.source_word, self.target_word = zip(*word_pairs) | ||
| if self.translation_matrix is None: |
There was a problem hiding this comment.
But if I called train twice, in the second time I don't fit model.
Please remove this if
|
|
||
| with utils.smart_open(self.train_file, "r") as f: | ||
| self.word_pair = [tuple(utils.to_unicode(line).strip().split()) for line in f] | ||
| self.word_pairs = [("one", "uno"), ("two", "due"), ("three", "tre"), |
There was a problem hiding this comment.
Please use hanging indents
| ("grape", "acino"), ("banana", "banana"), ("mango", "mango") | ||
| ] | ||
|
|
||
| self.test_word_pairs = [("ten", "dieci"), ("dog", "cane"), ("cat", "gatto")] |
There was a problem hiding this comment.
Remove ("dog", "cane") from self.word_pairs
|
|
||
| def normalize(self): | ||
| """ normalized the word vector's matrix """ | ||
| """ Normalized the word vector's matrix """ |
There was a problem hiding this comment.
'Normalize…' (imperative rather than past-tense)
| def apply_transmat(self, words_space): | ||
| """ | ||
| mapping the source word vector to the target word vector using translation matrix | ||
| Mapping the source word vector to the target word vector using translation matrix |
There was a problem hiding this comment.
'Map…' (imperative rather than '-ing' form)
|
The handling of Though I know I requested the Doc2Vec-related example, in its current form the motivations/benefits are muddled. Really it shouldn't require a separate helper class ( The word-translation example can presumably be evaluated based on real datasets in the original context that motivated the approach, while the doc-vec example will need more novel design/evaluation – so I'd recommend splitting them into separate notebooks. |
|
Thanks. You do remind me the imbalanced set problem in the example. And the code for training document vector are borrowed from the For the for If I didn't catch that |
|
I meant there are published papers about using word-vector transformations for language-translation - the original Google paper, the Dinu paper – so there are specific datasets & procedures to mimic – and similar results would indicate everything is working. The Doc2Vec use is novel so requires more experimentation/thought. |
|
The word pairs used in this experiment are extracted from the OPUS(http://opus.lingfil.uu.se/). The same as the Ninu's paper. I plot the vis to show the linear relationship between two language vector space and use the word translation to show this transformation works. More re-produced experiments from the Mikolov's and Ninu's paper would be fine to support this transformation. But I still can not find any experiment for language translation(Do it mean sentences translation ) from the two paper I mentioned here? Can you remind me If I miss something? I added "unstable/experimental" warning tag in notebook explicitly for I also found a paper (OFFLINE BILINGUAL WORD VECTORS,ORTHOGONALTRANSFORMATIONS AND THE INVERTED SOFTMAX) is related to this experiment. But I‘m just having a look and need dig deeper. |
|
Thank you @robotcator for fast fixes. |
|
Need to fix some typos/pep8 issues in a notebook, but I can't wait for more, it's release time. |
No description provided.