Convert to absolute paths in wordrank by parulsethi · Pull Request #1503 · piskvorky/gensim

parulsethi · 2017-07-24T17:08:49Z

Converted relative paths to absolute for every wordrank command.

menshikh-iv · 2017-07-24T18:22:46Z


        logger.info("Deleting frequencies from vocab file")
-        with smart_open(vocab_file, 'wb') as w:
+        with smart_open(join(meta_dir, vocab_file), 'wb') as w:


Please move join to definition of vocab_file (line 91) and same changes for all smart_open arguments

piskvorky

Minor code style suggestions.

piskvorky · 2017-07-25T02:35:11Z

        # prepare training data (cooccurrence matrix and vocab)
-        model_dir = os.path.join(wr_path, out_name)
-        meta_dir = os.path.join(model_dir, 'meta')
+        model_dir = join(wr_path, out_name)


Using full namespace os.path.join is preferable.

There are many joins in Python and its various libraries, and the context makes the code immediately easier to read and understand for other readers.

piskvorky · 2017-07-25T02:37:08Z


        commands = [cmd_vocab_count, cmd_cooccurence_count, cmd_shuffle_cooccurences]
-        input_fnames = [corpus_file.split('/')[-1], corpus_file.split('/')[-1], cooccurrence_file]
+        input_fnames = [join(meta_dir, corpus_file.split('/')[-1]), join(meta_dir, corpus_file.split('/')[-1]), cooccurrence_file]


string.split('/') is not portable -- see os.path.split, os.path.basename etc.

piskvorky · 2017-07-25T02:37:55Z

            numlines = sum(1 for line in f)
        with smart_open(meta_file, 'wb') as f:
-            meta_info = "{0} {1}\n{2} {3}\n{4} {5}".format(numwords, numwords, numlines, cooccurrence_shuf_file, numwords, vocab_file)
+            meta_info = "{0} {1}\n{2} {3}\n{4} {5}".format(numwords, numwords, numlines, cooccurrence_shuf_file.split('/')[-1], numwords, vocab_file.split('/')[-1])


Dtto on split.

Elsewhere in the file (and in gensim) the standard C-style %s %d %f string formatting is used; best to keep it consistent here as well.

@piskvorky formatting with {}.format more preferable for Python now. I think we should use format method instead of C-style formatting.

kept {}.format for now

piskvorky

Minor code style comments.

piskvorky · 2017-07-26T13:02:41Z


        commands = [cmd_vocab_count, cmd_cooccurence_count, cmd_shuffle_cooccurences]
-        input_fnames = [join(meta_dir, corpus_file.split('/')[-1]), join(meta_dir, corpus_file.split('/')[-1]), cooccurrence_file]
+        input_fnames = [os.path.join(meta_dir, os.path.split(corpus_file)[-1]), os.path.join(meta_dir, os.path.split(corpus_file)[-1]), cooccurrence_file]


This line is a little hard to navigate -- any way to restructure the logic to make it more readable? Maybe factor out some of the arguments into separate lines?

piskvorky · 2017-07-26T13:05:05Z

        os.makedirs(meta_dir)
        logger.info("Dumped data will be stored in '%s'", model_dir)
-        copyfile(corpus_file, join(meta_dir, corpus_file.split('/')[-1]))
+        copyfile(corpus_file, os.path.join(meta_dir, corpus_file.split('/')[-1]))


Isn't os.path.split()[-1] simply os.path.basename()?

parulsethi added 2 commits July 24, 2017 22:32

convert to absolute paths for every command

5802bdb

use sorted in directory structure test

53158e1

menshikh-iv reviewed Jul 24, 2017

View reviewed changes

move join() to var definition

7c13de9

piskvorky requested changes Jul 25, 2017

View reviewed changes

parulsethi added 2 commits July 25, 2017 15:31

made requested changes

7345023

change gensim pin to develop in dockerfile

5bbe888

menshikh-iv merged commit 7a9e98e into piskvorky:develop Jul 25, 2017

piskvorky reviewed Jul 26, 2017

View reviewed changes

Uh oh!

Conversation

parulsethi commented Jul 24, 2017

Uh oh!

menshikh-iv Jul 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

parulsethi Jul 24, 2017

Choose a reason for hiding this comment

Uh oh!

piskvorky left a comment

Choose a reason for hiding this comment

Uh oh!

piskvorky Jul 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

parulsethi Jul 25, 2017

Choose a reason for hiding this comment

Uh oh!

piskvorky Jul 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

parulsethi Jul 25, 2017

Choose a reason for hiding this comment

Uh oh!

piskvorky Jul 25, 2017

Choose a reason for hiding this comment

Uh oh!

menshikh-iv Jul 25, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

parulsethi Jul 25, 2017

Choose a reason for hiding this comment

Uh oh!

piskvorky left a comment

Choose a reason for hiding this comment

Uh oh!

piskvorky Jul 26, 2017

Choose a reason for hiding this comment

Uh oh!

piskvorky Jul 26, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

menshikh-iv Jul 24, 2017 •

edited

Loading

piskvorky Jul 25, 2017 •

edited

Loading

piskvorky Jul 25, 2017 •

edited

Loading

menshikh-iv Jul 25, 2017 •

edited

Loading