Make killable threads more robust to race condition by kdsudac · Pull Request #740 · google/openhtf

kdsudac · 2018-03-20T03:36:26Z

There is a very small chance that a killable thread will end on it's own before it is killed. Make us robust to this race condition by not raising an error if the thread is no longer alive.

This change is

There is a very small chance that a killable thread will end on it's own before it is killed. Make us robust to this race condition by not raising an error if the thread is no longer alive.

googlebot · 2018-03-20T03:36:29Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

kdsudac · 2018-03-20T03:42:23Z

I signed it!

Kenadia · 2018-03-20T04:29:54Z

openhtf/util/threads.py

@@ -95,7 +95,7 @@ def async_raise(self, exc_type):
    assert self.is_alive(), 'Only running threads have a thread identity'


Makes me wonder if this assert belongs here. Maybe assert self.ident is not None is appropriate, seems like that's actually guaranteed to be true, and not subject to a race condition.

assert self.ident is not None, 'Thread is missing identifier since it was never started.'

Kenadia · 2018-03-20T04:44:12Z

openhtf/util/threads.py

        ctypes.c_long(self.ident), ctypes.py_object(exc_type))
-    if result == 0:
+    if result == 0 and self.is_alive():
      raise ValueError('Thread ID was invalid.', self.ident)


Some interesting notes in the documentation of PyThreadState_SetAsyncExc:

To prevent naive misuse, you must write your own C extension to call this. Must be called with the GIL held.

I wonder what's the worst that could happen? Your change already seems like an improvement--I can't tell in what situation this ValueError would ever be raised anymore. The only loophole I've thought of is that since thread identifiers can be “recycled,” I wonder if there's a very small chance that we end up killing some arbitrary other thread that was not meant to be killed.

Yeah, I think the C extension that holds the GIL is the only way to completely eliminate possibility of a race condition here. I don't want to take that on now though but did add a note in the class docstring.

From documentation of threading.Thread on ident:

The ‘thread identifier’ of this thread or None if the thread has not been started. This is a nonzero integer. See the thread.get_ident() function. Thread identifiers may be recycled when a thread exits and another thread is created. The identifier is available even after the thread has exited.

So the assertion message is a little misleading. Since it's not running threads but started threads that have an ident. I changed the message.

I also put a check on is_alive before calling PyThreadState_SetAsyncExc. This might be a bit of overkill considering we also check on result == 0 and self.is_alive(), but it seems to me that it would be better to not call at all if the thread is not alive.

The GIL is held when running Python code, if you want to ensure that it's held throughout a function, you can temporarily set the check interval (sys.setcheckinterval) to a very large number. That will ensure that Python doesn't even attempt to switch out the GIL, since the check interval is how often the interpreter checks things, like if other threads are ready to run.

https://docs.python.org/2/c-api/init.html#thread-state-and-the-global-interpreter-lock

As far as I can tell, @fahhem's solution should work. But I know there are also some concerns about using sys.setcheckinterval in this way. This article, A Neat Python Hack? No, Broken Code, points out that code in the critical section may be releasing the GIL unbeknownst to you. This doesn't seem to be an issue in our case, if we're completely sure about what ctypes.pythonapi.PyThreadState_SetAsyncExc is doing (which I'm not).

Pending an airtight solution, I'm fine with @kdsudac's existing improvement, which I believe leaves only the remote, probably-never-gonna-happen possibility of killing the wrong thread

Kenadia · 2018-03-20T04:49:09Z

openhtf/util/threads.py

+    if result == 0 and self.is_alive():
      raise ValueError('Thread ID was invalid.', self.ident)
    elif result != 1:
      # Something bad happened, call with a NULL exception to undo.


From the PyThreadState_SetAsyncExc documentation:

Returns the number of thread states modified; this is normally one, but will be zero if the thread id isn’t found.

Not sure when this would happen. But SystemError doesn't seem correct. Maybe RuntimeError?

Sure changed it to a RuntimeError.

Also noticed that result=0 would go into the elif, so change to greater than.

googlebot · 2018-03-20T16:58:52Z

CLAs look good, thanks!

fahhem · 2018-03-20T18:09:40Z

openhtf/util/threads.py

  Based on recipe available at http://tomerfiliba.com/recipes/Thread2/
+
+  Important Note:
+    From PyThreadState_SetAsyncExc documentation:  "To prevent naive misuse,


The naive misuse comment here was likely from before the ctypes.pythonapi module was created, or it was an intentional red herring. The GIL is held when running Python, so what they really should have said is that the GIL must be held from when the thread is checked to when it's async-raised, which setcheckinterval can fix.

Let's be more specific in this note, because as Farz points out, it's not all relevant/correct. I'd say something like:

Note: To fully address race conditions involved with the use of PyThreadState_SetAsyncExc, the GIL must be held from when the thread is checked to when it's async-raised. In this case, we're not doing that, and there remains the remote possibility that a thread identifier is reused and we accidentally kill the wrong thread.

Kenadia · 2018-03-23T00:37:39Z

openhtf/util/threads.py

    """Raise the exception."""
-    assert self.is_alive(), 'Only running threads have a thread identity'
+    # Should only be called on a started thread so raise otherwise
+    assert self.ident is not None, 'Only started threads have a thread identity'


“identity” --> “identifier”

Kenadia · 2018-03-23T17:38:13Z

openhtf/util/threads.py

+    # If the thread has died we don't want to raise an exception so log.
+    if not self.is_alive():
+      _LOG.debug('Not raising %s because thread %s (%s) is not alive',
+                 exc_type, self.name, self.ident)


needs return here

I don't think we should print self.ident since as far as the documentation indicates it's a Python-specific magic cookie to be used e.g. to index a dictionary of thread-specific data. I notice this is the only file where it is used. self.name should be sufficient.

it's printed out in case it is special or odd. I think it can be None for threads that aren't started yet

The none-check is on the previous line, I'm not sure if there are other odd cases

Added return.

As farz pointed out, including ident could potentially be useful debugging information and there is no harm to including it at debug level.

Kenadia · 2018-03-23T17:53:41Z

openhtf/util/threads.py

+    # Should only be called on a started thread so raise otherwise
+    assert self.ident is not None, 'Only started threads have a thread identity'
+
+    # If the thread has died we don't want to raise an exception so log.


I don't think calling PyThreadState_SetAsyncExc on a thread that is already dead is an issue—it seems to happily return 0 in that case.

The reason I can see why we might want to make this check is to minimize the chance of accidentally killing some other thread, so that's what I would mention in the comment.

FYI, idents are large (64 bit ints on 64-bit machines) so rollover/reuse is unlikely. You'd have to spawn 2^64 threads, and kill 2^(50+) before it rolls over since you can't have that many running anyway

for i in xrange(10): t = threading.Thread(target=infinite_loop) t.setDaemon(1) t.start() print t.ident

Returns monotonically increasing IDs, spanning a range of 38 million, for just 10 threads. I agree it's unlikely, hopefully virtually impossible, but not sure about the implementation details.

Kenadia · 2018-03-23T17:59:47Z

openhtf/util/threads.py

    result = ctypes.pythonapi.PyThreadState_SetAsyncExc(
        ctypes.c_long(self.ident), ctypes.py_object(exc_type))
-    if result == 0:
+    if result == 0 and self.is_alive():


Seems like you actually got bit by this before so I'd leave a comment here so it's emphasized that this isn't redundant with the earlier check: # Do not raise an error unnecessarily if the thread is dead.

Kenadia · 2018-03-23T18:07:03Z

openhtf/util/threads.py

        ctypes.c_long(self.ident), ctypes.py_object(exc_type))
-    if result == 0:
+    if result == 0 and self.is_alive():
      raise ValueError('Thread ID was invalid.', self.ident)


As far as I can tell, @fahhem's solution should work. But I know there are also some concerns about using sys.setcheckinterval in this way. This article, A Neat Python Hack? No, Broken Code, points out that code in the critical section may be releasing the GIL unbeknownst to you. This doesn't seem to be an issue in our case, if we're completely sure about what ctypes.pythonapi.PyThreadState_SetAsyncExc is doing (which I'm not).

Pending an airtight solution, I'm fine with @kdsudac's existing improvement, which I believe leaves only the remote, probably-never-gonna-happen possibility of killing the wrong thread

Kenadia · 2018-03-23T18:14:51Z

openhtf/util/threads.py

      ctypes.pythonapi.PyThreadState_SetAsyncExc(self.ident, None)
-      raise SystemError('PyThreadState_SetAsyncExc failed.', self.ident)
+      raise RuntimeError('PyThreadState_SetAsyncExc %s failed: %s',
+                         self.ident, result)


Technically should be something like below, since the command didn't “fail”

'Error: PyThreadState_SetAsyncExc on %s affected %d threads.' % (self.name, result)

No one really knows when this case is reached.

As noted above I don't think we should print self.ident

Used your text, I'm including all the information we have, thread name, exc_type, thread name and result. These failure conditions are rare so somewhat hard to replicate so we should acquire as much debugging info as possible.

Kenadia · 2018-03-23T18:24:40Z

openhtf/util/threads.py

  Based on recipe available at http://tomerfiliba.com/recipes/Thread2/
+
+  Important Note:
+    From PyThreadState_SetAsyncExc documentation:  "To prevent naive misuse,


Let's be more specific in this note, because as Farz points out, it's not all relevant/correct. I'd say something like:

Note: To fully address race conditions involved with the use of PyThreadState_SetAsyncExc, the GIL must be held from when the thread is checked to when it's async-raised. In this case, we're not doing that, and there remains the remote possibility that a thread identifier is reused and we accidentally kill the wrong thread.

kdsudac · 2018-03-23T21:16:26Z

Believe, I addressed all of your concerns.

Main goal of this PR is to 1) make KillableThreads more robust to case of an exception being raised in a dead thread; 2) Get additional debugging information for those rare cases where a problem is detected.

Follow-up work to improve or document limitations on KillableThreads may be necessary.

Make killable threads more robust to race condition

9b04704

There is a very small chance that a killable thread will end on it's own before it is killed. Make us robust to this race condition by not raising an error if the thread is no longer alive.

Kenadia suggested changes Mar 20, 2018

View reviewed changes

cleaning up based on code review comments

b75f787

kdsudac added 2 commits March 20, 2018 10:06

Adding extra protection around async_raise on dead thread

22d0ceb

adding comment about possible race conditions

8daf31b

fahhem reviewed Mar 20, 2018

View reviewed changes

Kenadia suggested changes Mar 23, 2018

View reviewed changes

better documenting changes

22da6c9

Kenadia approved these changes Mar 23, 2018

View reviewed changes

Kenadia merged commit d876cae into master Mar 23, 2018

Kenadia deleted the kdsudac-patch-3 branch June 15, 2018 01:31

		@@ -95,7 +95,7 @@ def async_raise(self, exc_type):
		assert self.is_alive(), 'Only running threads have a thread identity'

Conversation

kdsudac commented Mar 20, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

googlebot commented Mar 20, 2018

What to do if you already signed the CLA

Individual signers

Corporate signers

Uh oh!

kdsudac commented Mar 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kenadia Mar 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

googlebot commented Mar 20, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Kenadia Mar 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kdsudac commented Mar 23, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kdsudac commented Mar 20, 2018 •

edited

Loading

Kenadia Mar 23, 2018 •

edited

Loading

Kenadia Mar 23, 2018 •

edited

Loading