Convert to absolute paths in wordrank#1503
Conversation
|
|
||
| logger.info("Deleting frequencies from vocab file") | ||
| with smart_open(vocab_file, 'wb') as w: | ||
| with smart_open(join(meta_dir, vocab_file), 'wb') as w: |
There was a problem hiding this comment.
Please move join to definition of vocab_file (line 91) and same changes for all smart_open arguments
piskvorky
left a comment
There was a problem hiding this comment.
Minor code style suggestions.
| # prepare training data (cooccurrence matrix and vocab) | ||
| model_dir = os.path.join(wr_path, out_name) | ||
| meta_dir = os.path.join(model_dir, 'meta') | ||
| model_dir = join(wr_path, out_name) |
There was a problem hiding this comment.
Using full namespace os.path.join is preferable.
There are many joins in Python and its various libraries, and the context makes the code immediately easier to read and understand for other readers.
|
|
||
| commands = [cmd_vocab_count, cmd_cooccurence_count, cmd_shuffle_cooccurences] | ||
| input_fnames = [corpus_file.split('/')[-1], corpus_file.split('/')[-1], cooccurrence_file] | ||
| input_fnames = [join(meta_dir, corpus_file.split('/')[-1]), join(meta_dir, corpus_file.split('/')[-1]), cooccurrence_file] |
There was a problem hiding this comment.
string.split('/') is not portable -- see os.path.split, os.path.basename etc.
| numlines = sum(1 for line in f) | ||
| with smart_open(meta_file, 'wb') as f: | ||
| meta_info = "{0} {1}\n{2} {3}\n{4} {5}".format(numwords, numwords, numlines, cooccurrence_shuf_file, numwords, vocab_file) | ||
| meta_info = "{0} {1}\n{2} {3}\n{4} {5}".format(numwords, numwords, numlines, cooccurrence_shuf_file.split('/')[-1], numwords, vocab_file.split('/')[-1]) |
There was a problem hiding this comment.
Dtto on split.
Elsewhere in the file (and in gensim) the standard C-style %s %d %f string formatting is used; best to keep it consistent here as well.
There was a problem hiding this comment.
@piskvorky formatting with {}.format more preferable for Python now. I think we should use format method instead of C-style formatting.
There was a problem hiding this comment.
kept {}.format for now
|
|
||
| commands = [cmd_vocab_count, cmd_cooccurence_count, cmd_shuffle_cooccurences] | ||
| input_fnames = [join(meta_dir, corpus_file.split('/')[-1]), join(meta_dir, corpus_file.split('/')[-1]), cooccurrence_file] | ||
| input_fnames = [os.path.join(meta_dir, os.path.split(corpus_file)[-1]), os.path.join(meta_dir, os.path.split(corpus_file)[-1]), cooccurrence_file] |
There was a problem hiding this comment.
This line is a little hard to navigate -- any way to restructure the logic to make it more readable? Maybe factor out some of the arguments into separate lines?
| os.makedirs(meta_dir) | ||
| logger.info("Dumped data will be stored in '%s'", model_dir) | ||
| copyfile(corpus_file, join(meta_dir, corpus_file.split('/')[-1])) | ||
| copyfile(corpus_file, os.path.join(meta_dir, corpus_file.split('/')[-1])) |
There was a problem hiding this comment.
Isn't os.path.split()[-1] simply os.path.basename()?
Converted relative paths to absolute for every wordrank command.