Skip to content

Make killable threads more robust to race condition#740

Merged
Kenadia merged 5 commits intomasterfrom
kdsudac-patch-3
Mar 23, 2018
Merged

Make killable threads more robust to race condition#740
Kenadia merged 5 commits intomasterfrom
kdsudac-patch-3

Conversation

@kdsudac
Copy link
Contributor

@kdsudac kdsudac commented Mar 20, 2018

There is a very small chance that a killable thread will end on it's own before it is killed. Make us robust to this race condition by not raising an error if the thread is no longer alive.


This change is Reviewable

There is a very small chance that a killable thread will end on it's own before it is killed.  Make us robust to this race condition by not raising an error if the thread is no longer alive.
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers
  • Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the project maintainer to go/cla#troubleshoot.
  • The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
  • The email used to register you as an authorized contributor must also be attached to your GitHub account.

@kdsudac
Copy link
Contributor Author

kdsudac commented Mar 20, 2018

I signed it!

@@ -95,7 +95,7 @@ def async_raise(self, exc_type):
assert self.is_alive(), 'Only running threads have a thread identity'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes me wonder if this assert belongs here. Maybe assert self.ident is not None is appropriate, seems like that's actually guaranteed to be true, and not subject to a race condition.

assert self.ident is not None, 'Thread is missing identifier since it was never started.'

ctypes.c_long(self.ident), ctypes.py_object(exc_type))
if result == 0:
if result == 0 and self.is_alive():
raise ValueError('Thread ID was invalid.', self.ident)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some interesting notes in the documentation of PyThreadState_SetAsyncExc:

To prevent naive misuse, you must write your own C extension to call this. Must be called with the GIL held.

I wonder what's the worst that could happen? Your change already seems like an improvement--I can't tell in what situation this ValueError would ever be raised anymore. The only loophole I've thought of is that since thread identifiers can be “recycled,” I wonder if there's a very small chance that we end up killing some arbitrary other thread that was not meant to be killed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the C extension that holds the GIL is the only way to completely eliminate possibility of a race condition here. I don't want to take that on now though but did add a note in the class docstring.

From documentation of threading.Thread on ident:

The ‘thread identifier’ of this thread or None if the thread has not been started. This is a nonzero integer. See the thread.get_ident() function. Thread identifiers may be recycled when a thread exits and another thread is created. The identifier is available even after the thread has exited.

So the assertion message is a little misleading. Since it's not running threads but started threads that have an ident. I changed the message.

I also put a check on is_alive before calling PyThreadState_SetAsyncExc. This might be a bit of overkill considering we also check on result == 0 and self.is_alive(), but it seems to me that it would be better to not call at all if the thread is not alive.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GIL is held when running Python code, if you want to ensure that it's held throughout a function, you can temporarily set the check interval (sys.setcheckinterval) to a very large number. That will ensure that Python doesn't even attempt to switch out the GIL, since the check interval is how often the interpreter checks things, like if other threads are ready to run.

https://docs.python.org/2/c-api/init.html#thread-state-and-the-global-interpreter-lock

Copy link
Contributor

@Kenadia Kenadia Mar 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, @fahhem's solution should work. But I know there are also some concerns about using sys.setcheckinterval in this way. This article, A Neat Python Hack? No, Broken Code, points out that code in the critical section may be releasing the GIL unbeknownst to you. This doesn't seem to be an issue in our case, if we're completely sure about what ctypes.pythonapi.PyThreadState_SetAsyncExc is doing (which I'm not).

Pending an airtight solution, I'm fine with @kdsudac's existing improvement, which I believe leaves only the remote, probably-never-gonna-happen possibility of killing the wrong thread

if result == 0 and self.is_alive():
raise ValueError('Thread ID was invalid.', self.ident)
elif result != 1:
# Something bad happened, call with a NULL exception to undo.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the PyThreadState_SetAsyncExc documentation:

Returns the number of thread states modified; this is normally one, but will be zero if the thread id isn’t found.

Not sure when this would happen. But SystemError doesn't seem correct. Maybe RuntimeError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure changed it to a RuntimeError.

Also noticed that result=0 would go into the elif, so change to greater than.

@googlebot
Copy link

CLAs look good, thanks!

Based on recipe available at http://tomerfiliba.com/recipes/Thread2/

Important Note:
From PyThreadState_SetAsyncExc documentation: "To prevent naive misuse,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naive misuse comment here was likely from before the ctypes.pythonapi module was created, or it was an intentional red herring. The GIL is held when running Python, so what they really should have said is that the GIL must be held from when the thread is checked to when it's async-raised, which setcheckinterval can fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more specific in this note, because as Farz points out, it's not all relevant/correct. I'd say something like:

Note: To fully address race conditions involved with the use of PyThreadState_SetAsyncExc, the GIL must be held from when the thread is checked to when it's async-raised. In this case, we're not doing that, and there remains the remote possibility that a thread identifier is reused and we accidentally kill the wrong thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"""Raise the exception."""
assert self.is_alive(), 'Only running threads have a thread identity'
# Should only be called on a started thread so raise otherwise
assert self.ident is not None, 'Only started threads have a thread identity'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“identity” --> “identifier”

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

# If the thread has died we don't want to raise an exception so log.
if not self.is_alive():
_LOG.debug('Not raising %s because thread %s (%s) is not alive',
exc_type, self.name, self.ident)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs return here

I don't think we should print self.ident since as far as the documentation indicates it's a Python-specific magic cookie to be used e.g. to index a dictionary of thread-specific data. I notice this is the only file where it is used. self.name should be sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's printed out in case it is special or odd. I think it can be None for threads that aren't started yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The none-check is on the previous line, I'm not sure if there are other odd cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added return.

As farz pointed out, including ident could potentially be useful debugging information and there is no harm to including it at debug level.

# Should only be called on a started thread so raise otherwise
assert self.ident is not None, 'Only started threads have a thread identity'

# If the thread has died we don't want to raise an exception so log.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think calling PyThreadState_SetAsyncExc on a thread that is already dead is an issue—it seems to happily return 0 in that case.

The reason I can see why we might want to make this check is to minimize the chance of accidentally killing some other thread, so that's what I would mention in the comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, idents are large (64 bit ints on 64-bit machines) so rollover/reuse is unlikely. You'd have to spawn 2^64 threads, and kill 2^(50+) before it rolls over since you can't have that many running anyway

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for i in xrange(10):
  t = threading.Thread(target=infinite_loop)
  t.setDaemon(1)
  t.start()
  print t.ident

Returns monotonically increasing IDs, spanning a range of 38 million, for just 10 threads. I agree it's unlikely, hopefully virtually impossible, but not sure about the implementation details.

result = ctypes.pythonapi.PyThreadState_SetAsyncExc(
ctypes.c_long(self.ident), ctypes.py_object(exc_type))
if result == 0:
if result == 0 and self.is_alive():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like you actually got bit by this before so I'd leave a comment here so it's emphasized that this isn't redundant with the earlier check: # Do not raise an error unnecessarily if the thread is dead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ctypes.c_long(self.ident), ctypes.py_object(exc_type))
if result == 0:
if result == 0 and self.is_alive():
raise ValueError('Thread ID was invalid.', self.ident)
Copy link
Contributor

@Kenadia Kenadia Mar 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, @fahhem's solution should work. But I know there are also some concerns about using sys.setcheckinterval in this way. This article, A Neat Python Hack? No, Broken Code, points out that code in the critical section may be releasing the GIL unbeknownst to you. This doesn't seem to be an issue in our case, if we're completely sure about what ctypes.pythonapi.PyThreadState_SetAsyncExc is doing (which I'm not).

Pending an airtight solution, I'm fine with @kdsudac's existing improvement, which I believe leaves only the remote, probably-never-gonna-happen possibility of killing the wrong thread

ctypes.pythonapi.PyThreadState_SetAsyncExc(self.ident, None)
raise SystemError('PyThreadState_SetAsyncExc failed.', self.ident)
raise RuntimeError('PyThreadState_SetAsyncExc %s failed: %s',
self.ident, result)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically should be something like below, since the command didn't “fail”

'Error: PyThreadState_SetAsyncExc on %s affected %d threads.' % (self.name, result)

No one really knows when this case is reached.

As noted above I don't think we should print self.ident

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used your text, I'm including all the information we have, thread name, exc_type, thread name and result. These failure conditions are rare so somewhat hard to replicate so we should acquire as much debugging info as possible.

Based on recipe available at http://tomerfiliba.com/recipes/Thread2/

Important Note:
From PyThreadState_SetAsyncExc documentation: "To prevent naive misuse,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be more specific in this note, because as Farz points out, it's not all relevant/correct. I'd say something like:

Note: To fully address race conditions involved with the use of PyThreadState_SetAsyncExc, the GIL must be held from when the thread is checked to when it's async-raised. In this case, we're not doing that, and there remains the remote possibility that a thread identifier is reused and we accidentally kill the wrong thread.

@kdsudac
Copy link
Contributor Author

kdsudac commented Mar 23, 2018

Believe, I addressed all of your concerns.

Main goal of this PR is to 1) make KillableThreads more robust to case of an exception being raised in a dead thread; 2) Get additional debugging information for those rare cases where a problem is detected.

Follow-up work to improve or document limitations on KillableThreads may be necessary.

@Kenadia Kenadia merged commit d876cae into master Mar 23, 2018
@Kenadia Kenadia deleted the kdsudac-patch-3 branch June 15, 2018 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants