Added function to replace gensim check_output with subprocess.check_output#1182
Added function to replace gensim check_output with subprocess.check_output#1182kirit93 wants to merge 17 commits into
Conversation
|
@kirit93 Thanks for the PR. Could you please replace the actual There was some reason I added extra |
|
@tmylk I ran the tests for mallet, dtm, fasttext and wordrank wrappers. All pass except for fasttext, but the error I get with my I'll get started with the rest of the things you mentioned. |
|
Looks like a bad test -- floats should never be compared for bit equality. Instead, there's |
|
Should I fix those tests? I'll change it to : I made the change and ran the test and there are no failures, the assertEqual is used in a couple of other places in this test. If this fix is okay, I'll make the changes everywhere. |
|
@kirit93 Confirm that using Does the error output get printed to the log with this new change? |
|
@tmylk |
tmylk
left a comment
There was a problem hiding this comment.
Please address comments about exception handling
| raise | ||
| except: | ||
| error = "Error in check_output while trying to execute: \n ' " + str(args) + " '\nthis command was not found" | ||
| print(error) |
There was a problem hiding this comment.
Is there a particular log file I should be logging output to? If not STDOUT, what is the right way to send out the error message? In the earlier function the output and error were being returned from the function. Should I do the same? I could output it to STDERR as the error generated by the shell when it tries executing the incorrect command is also written to STDERR.
| process.terminate() | ||
| raise | ||
| except: | ||
| error = "Error in check_output while trying to execute: \n ' " + str(args) + " '\nthis command was not found" |
There was a problem hiding this comment.
Could there be exceptions even when command is found? if yes, then please remove "command was not found"
There was a problem hiding this comment.
Yes, I overlooked that detail. Should I change the error message to something more generic like " xyz command could not be executed "?
| return res | ||
| except KeyboardInterrupt: | ||
| process.terminate() | ||
| raise |
There was a problem hiding this comment.
what is the purpose of catching and immediately throwing the exception? this line doesn't add anything as is
There was a problem hiding this comment.
It makes sure the catch-all except: below doesn't apply to KeyboardInterrupt.
But it's not a good practice anyway, because there's also SystemExit etc. At least limit the catch-all to Exception.
There was a problem hiding this comment.
Would it be a good idea to catch the subprocess.CalledProcessError and output an error message that is easy for the user to understand? This is the error raised when check_output fails.
For any other error including KeyboardInterrupt I can use a catch-all Exception and raise it.
| return res | ||
| except KeyboardInterrupt: | ||
| process.terminate() | ||
| raise |
There was a problem hiding this comment.
It makes sure the catch-all except: below doesn't apply to KeyboardInterrupt.
But it's not a good practice anyway, because there's also SystemExit etc. At least limit the catch-all to Exception.
| """ | ||
| selected_keys = random.sample(list(d), min(len(d), n)) if use_random else itertools.islice(iterkeys(d), n) | ||
| return [(key, d[key]) for key in selected_keys] | ||
| return [(key, d[key]) for key in selected_keys] No newline at end of file |
| raise | ||
| except: | ||
| error = "Error in check_output while trying to execute: \n ' " + str(args) + " '\nthis command was not found" | ||
| print(error) |
There was a problem hiding this comment.
The original exception was more useful, because it included stdout/stderr, which useful for error debugging.
Is there any way to do the same here?
There was a problem hiding this comment.
I'm sorry, I didn't fully understand what you mean here. Which original exception are you talking about?
There was a problem hiding this comment.
I meant subprocess.CalledProcessError, a few lines above.
I don't have any strong opinion on what this function should do on failure, or how to capture the subprocess stdout/stderr. Except for sure we don't want to be printing anything -- logging preferred.
| In case args generates an error | ||
| >>> test_checkoutput(args=['/usr/bin/pythons', '-ve']) #Incorrect argument | ||
| /bin/sh: /usr/bin/pythons: No such file or directory | ||
| *Error in args : /usr/bin/pythons -ve |
There was a problem hiding this comment.
Is this something that is printed? Or logged?
In general, we don't want to pollute stdout at all, unless the user explicitly requested printing.
There was a problem hiding this comment.
The No such file or directory line gets outputted to STDERR as it is the error returned by the shell when it tries to execute the command in args.
The Error in args: /usr/bin/pythons -ve message gets printed to STDOUT because I'm using a simple print command. Is there a log file you'd like me to log the output to or should I print to STDERR if we don't want STDOUT polluted?
| Instead of raising the error, output a more specific error message | ||
| """ | ||
| error = "subprocess.check_output could not execute command ' " + str(args) + " '" | ||
| print(error, file=sys.stderr) |
There was a problem hiding this comment.
Please log instead of printing, see examples in the code
There was a problem hiding this comment.
Please output to log instead of printing.
tmylk
left a comment
There was a problem hiding this comment.
Please add tests for the exception handling
| error = "subprocess.check_output could not execute command ' " + str(args) + " '" | ||
| print(error, file=sys.stderr) | ||
| return error | ||
| except Exception: |
There was a problem hiding this comment.
this code is redundant. It will be raised anyway
| self.assertTrue(True) | ||
|
|
||
| class TestOutput(unittest.TestCase): | ||
| def test_check_output(self): |
There was a problem hiding this comment.
Please add a test that exception is raised
|
|
||
| def test_check_output_exception(self): | ||
| error = utils.check_output(args=['ldfs']) | ||
| self.assertEqual(error, "subprocess.check_output could not execute command ' ldfs '") |
There was a problem hiding this comment.
test_utils.py imports utils.py from the installed gensim. So this test case is still running the old version of check_output and therefore the exception test fails. However, if you change the import to use the current version of utils.check_output, all tests pass.
There was a problem hiding this comment.
you can install your new version of gensim with python setup.py install
|
Please add unit test that exception is indeed raised. |
… replaced deprecated assertEquals with assertEqual
| """ | ||
| error = "subprocess.check_output could not execute command ' " + str(args) + " '" | ||
| logger.error(error) | ||
| return error |
There was a problem hiding this comment.
what is the reason to return instead of previous raise?
There was a problem hiding this comment.
The reason I've chosen to simple return the error message is because raise results in the following large and not user friendly message.
Traceback (most recent call last):
File "test.py", line 51, in test_check_out
res = subprocess.check_output(args, shell=flag)
File "/Users/kirit/anaconda/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/Users/kirit/anaconda/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'ldfs' returned non-zero exit status 127
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test.py", line 76, in <module>
test_check_out(args=["ldfs"])
File "test.py", line 60, in test_check_out
raise error
TypeError: exceptions must derive from BaseException
There was a problem hiding this comment.
Raising a string (instead of some exception) is definitely not a good idea.
There was a problem hiding this comment.
How would we benefit from raising an exception here? Once the subprocess.CalledProcessError occurs and is caught, an appropriate message is returned to the user indicating why the check_output command failed. Wouldn't that be enough for the user to proceed correctly?
There was a problem hiding this comment.
Your raise error already raises an exception. It's just that proper exceptions inherit from Exception (or appropriate subclasses), rather than being just strings.
There was a problem hiding this comment.
I could do a raise Exception(error).
Would that be okay?
There was a problem hiding this comment.
I haven't looked into the actual logic. Just commenting on the discussion above, where you say The reason I've chosen to simple return the error message is because raise results in the following large and not user friendly message., but that message is only due to raising a string instead of a proper exception :)
There was a problem hiding this comment.
My 2c would be to use the same error-handling logic that the function employed previously, before your changes. The least surprise to users.
|
@tmylk @piskvorky I've modified the function to handle errors just as the earlier function would. I hope this is okay. Please let me know if there are any further changes to be made so that the PR can be merged. Thanks! |
| """ | ||
| error = "subprocess.check_output could not execute command ' " + str(args) + " '" | ||
| logger.error(error) | ||
| raise |
There was a problem hiding this comment.
Please raise the CalledProcessError exception as before.
There was a problem hiding this comment.
I may be missing something, but why exactly do we need to raise the CalledProcessError at all. As long as we tell the user why check_output failed, shouldn't that be fine? Using shell=true with subprocess.check_output, in case of erroneous input, the same error message gets displayed as it would if the user were to enter the command on the terminal. This serves the purpose of telling the user why check_output failed.
If I pass ldfs to check_output, stderr will have /bin/sh: ldfs: command not found. Similarly, if you were to type the same command on the terminal, the output would be bash: ldfs: command not found.
In the earlier code stderr would have had a huge error message indicating a FileNotFound. If required CalledProcessError can be captured and a similar error message can be logged.
In case I've missed something and there is some reason that an implementation similar to the earlier one is preferred, I will look into that.
There was a problem hiding this comment.
First of all we need the actual output to be written to log, not to stderr.
Secondly, the tests should check that the return message is indeed for ldfs: command not found. The current message they test for is not informative
There was a problem hiding this comment.
Would it be okay for the log to have the error message as follows
/bin/sh: ldfs: command not found
Command '['ldfs']' returned non-zero exit status 127.
along with a subprocess.CalledProcessError being raised?
In the test case, I'm unable to figure out how to use assertRaises to verify that it was in fact a subprocess.CalledProcessError that was raised.
|
It seems that tests didn't run for the last commit, Is it correct that the only difference between old code and this improvement is that the error is printed to log here: However this doesn't answer the original purpose of the issue in #703 :
Please add code and tests that actually do that. |
|
@tmylk, could you check out my latest push and let me know if that works for us? |
|
@menshikh-iv, could you let me know if this PR is okay to merge? |
| Python 2.6.2 | ||
| Added extra KeyboardInterrupt handling | ||
| def check_output(args, flag=True): | ||
| r""" |
| >>> check_output(args=['/usr/bin/python', '--version']) | ||
| Python 2.6.2 | ||
| Added extra KeyboardInterrupt handling | ||
| def check_output(args, flag=True): |
There was a problem hiding this comment.
A new interface breaks down many wrappers from gensim. First, please merge develop to your branch (it needs to correct test run).
Second, run tests locally with all wrappers like this
FT_HOME=/home/ivan/release/test/fastText WR_HOME=/home/ivan/release/test/wordrank VOWPAL_WABBIT_PATH=/home/ivan/release/test/vowpal_wabbit/vowpalwabbit/vw DTM_PATH=/home/ivan/release/test/dtm/dtm/main MALLET_HOME=/home/ivan/release/test/Mallet python setup.py test
Another variant is PR#1368 docker image. You can use it for full test run
There was a problem hiding this comment.
My branch is upto date with develop. I ran python setup.py test with the appropriate env variable. Two tests failed but these are tests unrelated to what I am working on.
What should I do now?
There was a problem hiding this comment.
@kirit93 hm, strange, I'll run all the tests and write the results here.
There was a problem hiding this comment.
@menshikh-iv, It seems like the travis build succeeded after develop was merged into issue. Is the PR good to go?
|
I run all tests and get 39 errors, but they are all similar to each other: First type Second type Third type For this reason, I can't merge your PR. |
|
@menshikh-iv for the second two error types I can fix it by changing the definition of my This way the way the function is called will not have to be changed in the other files in the code. And the first type of error is not related to |
|
@kirit93 First type error is strange, I will investigate it later. Could you please do a few things:
After this fixes, I will run all tests again and write a result to this PR. |
|
@menshikh-iv I made the changes you requested and pushed them. Travis built successfully. Please let me know if the PR is okay now. |
|
Sorry for late response @kirit93, Now, I get a new kind of error, look at command |
|
@menshikh-iv, I reverted to the old implementation of Could the cause of the error be something unrelated to the |
|
@kirit93 unless your system user is |
|
@piskvorky, I ran the following command |
|
@kirit93 Where did you get that command from, do those paths really exist? |
|
No they don't exist on my system, but @menshikh-iv asked me to run that command to test whether the PR works or not. |
|
@kirit93 It's a path in my filesystem. For the full test run, you should:
|
|
@kirit93 I just ran all the tests on the development branch and made sure that everything works with the implementation from |
|
@menshikh-iv, any idea what could be the cause of this error? |
|
In my opinion, it is now easier to recreate PR and slightly change the code from |
@tmylk I've added a function above the existing
check_outputfunction in an attempt to solve this issue. Please let me know if there's anything I need to change. This is with reference to issue #703