Skip to content

Options to aid zero-downtime deployment / process supervision#132

Merged
nevans merged 4 commits intoresque:masterfrom
ShippingEasy:upstream-pr
Oct 14, 2015
Merged

Options to aid zero-downtime deployment / process supervision#132
nevans merged 4 commits intoresque:masterfrom
ShippingEasy:upstream-pr

Conversation

@brasic
Copy link
Contributor

@brasic brasic commented Oct 6, 2015

This PR includes the remaining differences in ShippingEasy's resque-pool fork, both of which are optional command line flags that tweak startup behavior to support zero-downtime deployment strategies and running resque-pool inside a process supervisor.

The normal approach to deploying new worker code is to signal resque-pool to gracefully shutdown with QUIT or INT, then start up a new instance. The problem with that approach is that long-running jobs and slow app startup can lead to a dramatic reduction in worker throughput during a deploy, which in our case is unacceptable.

The technique we use to maintain full throughput during deploys is to start a second resque-pool instance with the new code which loads a rails environment and gracefully shuts down the previous instance only when its children are ready to accept work. This requires two new command line options:

  1. --no-pidfile: Launching a second resque-pool instance in daemon mode was not possible because it detects the prior instance by its pidfile and refuses to start. This option causes the daemon to skip pidfile creation so multiple instances can run at the same time. Of course, this should only be used if the daemon is run under a supervisor like upstart or systemd that can automatically detect the pids of managed processes.

  2. --lock FILE: Even though upstart and systemd can detect the pids of processes they manage, neither copes well with code redeploys, where an entirely new process replaces the existing one. For this we rely on a approach adapted from this post where resque-pool opens a shared filesystem lock at startup on a designated file. Deploys cause the new process to open an additional shared lock on the same file before the old process exits, meaning that as long as at least one instance is running the file will always be locked in shared mode. The process supervisor starts resque-pool then immediately attempts to open an exclusive lock on the same file. This will block until all shared locks are cleared, so the supervisor will only attempt to restart resque-pool when there are truly no instances running. In this way the shared lock file acts as surrogate pid file, except it can be shared across redeployed processes.

This approach is a little unusual but it works well for us - we are able to deploy many times per day under heavy load with zero impact on job throughput or latency. We would like to switch from our fork to the official version, so I hope these contributions will be useful to others.

@joshuaflanagan
Copy link
Contributor

👍 This code has been running reliably for us for almost a year.

@nevans
Copy link
Collaborator

nevans commented Oct 6, 2015

Nice, and simple too!

  1. Would you mind adding some documentation? Just a short blurb in README.md and some example upstart (or systemd) and capistrano and whatever else is necessary to make it all work. At some point, I'd like to directly support capistrano via require "resque/pool/capistrano" and clean up the examples dir. But for today, a sentence and code snippets in README.md will do. :)
  2. How does the new pool obtain the pid of the old pool? How do you know when the new workers are ready to accept work? Did you make a callback for that, or do you use some existing callback to trigger signaling the old master? Is it possible to package that code up as well, perhaps enabled via another command line option? E.g.: resque-pool --restart-with-shared-lock could set --no-pidfile --lock tmp/resque-pool.lock (sensible default lockfile location) and then signal the original master after startup is complete.
  3. This could get folks in trouble if they are tightly memory constrained (since the new pool doesn't wait for the old pool to shutdown its workers before it starts up the new ones), but for most users that's less of an issue than the downtime. Besides, I think that another fork might already have a solution for this. But it's probably worth mentioning that in the README.md.

@brasic
Copy link
Contributor Author

brasic commented Oct 6, 2015

Sure, I can add some documentation in the readme.

We handle old-pool-termination inside our application by a utility that parses the output of ps. So our resque:pool:setup task looks like:

task "resque:pool:setup" do
  # [snip activerecord reconnect stuff]
  obtain_shared_lock
  ResquePoolReaper.shutdown_other_pools
end

It isn't ideal to trust the application to do this - ideally resque-pool would have first-class support for doing this step itself but we didn't want to patch it too much, and the shutdown_other_pools functionality isn't tested on anything other than linux.

If you're open to it I wouldn't mind taking a stab at cleaning up and integrating this functionality into resque-pool itself so that it would be easier for other users to get working.

@brasic
Copy link
Contributor Author

brasic commented Oct 6, 2015

regarding 3), it's true that this method has the potential to use more memory than a more 'hands-on' approach e.g. where the new instance somehow communicated with the old instance and waited for it to release slots before forking each new managed worker. But I think that would be pretty complicated and in practice the memory bump is not very substantial in linux due to copy-on-write.

I'd be interested to see the approach taken by the other fork you mentioned.

@nevans
Copy link
Collaborator

nevans commented Oct 7, 2015

I'm not a huge fan of ps parsing, but it's better than doing nothing. If it's been working for you for a year on both Linux and Mac OS X, then it'll be fine (at least as a first pass). So go ahead and post what you have. We could also consider using a different approach, e.g. pid-dir instead of pid-file, or register pool pids in redis.

@nevans
Copy link
Collaborator

nevans commented Oct 7, 2015

backupify's fork detects orphaned worker count via a ps | grep | awk monstrosity (with plenty of other ps usage for other memory management in their fork) and then uses that as a rough delta from the configured worker count. I'd personally prefer a slightly different approach; e.g. create a new custom config_loader that looks at the redis:workers set in redis and diffs against the workers in current pool.

Resque.redis.smembers("workers").
  select {|w| w.starts_with?("#{hostname}:") }.
  map    {|w| w.split(":", 3) }.
  sort_by(&:last).
  each_with_object(Hash.new do |h,k| h[k] = [] end) do |(_host, pid, queues), h|
    h[queues] << Integer(pid)
  end

That approach doesn't work quite so well for multiple pools per hostname. This can be worked around by keeping a separate registry. Or resqued manages this by having another process layer above the pool manager (their top layer is their "manager" and our manager layer is their "listener") and they use a socket to communicate all worker start/stops up from the listeners to the manager and broadcast down from the manager to all other listeners. I'd just as soon use a socket file than create a new process.

@brasic
Copy link
Contributor Author

brasic commented Oct 7, 2015

I would also like to stop parsing ps. I like the idea of a pid directory - a similar option that would simplify the code might be to replace --no-pidfile with an --allow-multi flag that caused the existing pidfile to be treated as a unique newline-separated list of pids.

@nevans
Copy link
Collaborator

nevans commented Oct 8, 2015

For speed of getting this in, let's see your ps implementation before we spend a lot of time playing around with alternate approaches. My gut is that ps is okay if all we're doing is detecting another pool master on the same host and signalling it to shut down.

Once we get into balancing total running workers (and carefully transitioning from one pool to the other to constrain the max total workers), we'll want something better than ps for that, and we might as well pick an approach that works well for both use cases. Using a shared pid-dir or socket can work for both, but I'm strongly biased towards registering both the pool master and the pool workers in redis for the simple reason that we can also then display info about running pools in resque-web.

@nevans
Copy link
Collaborator

nevans commented Oct 8, 2015

Also, I'd want to avoid multiple processes writing comma-delimited pids to the same file, for the simple reason that I don't want to deal with race-conditions and/or locks when it's possible to design around them.

@nevans nevans mentioned this pull request Oct 8, 2015
@joshuaflanagan
Copy link
Contributor

Just wanted to give an update that I'm working on a cleaned up version of our ps implementation that @brasic and I will test and likely submit as a separate PR.

joshuaflanagan added a commit to ShippingEasy/resque-pool that referenced this pull request Oct 11, 2015
Adds the `--kill-others` command line option.
When this option is set, resque-pool will send a graceful
shutdown signal to all other running resque-pools.

This is useful in "no downtime deploy" scenarios,
where you want the pool running the new code to
completely load its environment and be ready to
process work *before* shutting down the old code.
Once the new code is ready, it will shut down the
old processes for you.

See also resque#132
brasic added a commit to ShippingEasy/resque-pool that referenced this pull request Oct 12, 2015
This explains functionality from resque#132 and resque#137 and provides example
upstart config files that will allow for zero-downtime deploys.
This change makes it possible to run multiple copies of resque-pool on
the same server using the --daemon flag, which was previously not
allowed since the server refuses to start if its configured pidfile
already exists.  Process supervisors like upstart and systemd already
know what pids they are managing, so in this context a pidfile is
unnecessary.

Without the ability to run concurrent daemon instances, zero-downtime
deployment (in which a new resque-pool instance starts up and slowly
replaces the previous instance until it is done) is not feasible.
This is to support running resque-pool inside of upstart while allowing
for zero-downtime restarts.  It enables the use of this strategy:

http://orchestrate.io/blog/2014/03/26/tip-seamless-restarts-with-unicorn-and-upstart/

The idea is that when the pool starts, it opens a shared lock that it
will hold forever.

In the envisioned use-case, upstart is responsible for restarting a
resque-pool if it fails.  When resque-pool daemonizes, upstart keeps
track of the process id.  However, to ensure no downtime, we do not stop
and start the pool.  Instead, when a deploy happens, a new resque-pool
instance will start and kill the old one after it is ready to fork
workers.  This would ordinarily cause upstart to detect that the
original process has died and relaunch it, which is not desired
behavior.  If the upstart init script tries to obtain an exclusive lock,
this attempt will block while any pool instance is still running,
including ones other than the original process.
This explains functionality from resque#132 and resque#137 and provides example
upstart config files that will allow for zero-downtime deploys.
@brasic
Copy link
Contributor Author

brasic commented Oct 13, 2015

I've added sample upstart configs and a readme section that documents the zero-downtime use case, including an explanation of the flag @joshuaflanagan introduced in #137.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--pidfile FILE should also override --no-pidfile; opts.delete(:no_pidfile)

Time to add specs for the option parsing, now that
it is gaining some non-trivial logic.

CLI#parse_options can accept an array of arguments,
to simplify testing.
@joshuaflanagan
Copy link
Contributor

@nevans I've addressed your concern regarding --pidfile overriding --no-pidfile

nevans added a commit that referenced this pull request Oct 14, 2015
Options to aid zero-downtime deployment / process supervision
@nevans nevans merged commit 1f409a4 into resque:master Oct 14, 2015
@nevans
Copy link
Collaborator

nevans commented Oct 14, 2015

I tweaked the --pidfile --no-pidfile interplay a little bit in 021056d. Thanks, again.

@jrochkind
Copy link
Contributor

A systemd example would be super helpful.

@brasic
Copy link
Contributor Author

brasic commented Jan 31, 2019

@jrochkind For what it's worth, under upstart running two instances was necessary but now that we use systemd we run a single instance and use SIGINT to orphan existing jobs on redeploy. Here's the unit definition we have been running for about two years now:

[Unit]
Description=resque pool manager
Documentation=https://github.com/nevans/resque-pool/
Requires=network.target

[Service]
Type=forking
User=<USER>
Group=<GROUP>
WorkingDirectory=<APP DIR>
EnvironmentFile=<ENV FILE>
PIDFile=<PATH TO PIDFILE>
# Allow resque to adjust its nice value (and the value of child processes)
LimitNICE=40
ExecStart=/usr/local/rvm/bin/rvm-shell -c 'bundle exec resque-pool --daemon --pidfile <PATH_TO_PIDFILE>'
# Only kill the main process.  INT will cause resque-pool-master to reparent
# children under PID 1 and exit, leaving existing jobs to run until complete.
KillMode=process
KillSignal=INT
SendSIGHUP=no
SendSIGKILL=no
TimeoutStopSec=60
TimeoutSec=60
Restart=on-failure
StandardOutput=syslog
StandardError=syslog
SyslogIdentifier=%n

[Install]
WantedBy=multi-user.target

@nevans
Copy link
Collaborator

nevans commented Mar 8, 2019

Will gladly (quickly?!) merge any PRs updates to the incredibly out-of-date examples directory. 😉
I've been deploying under docker for a couple of years now, so I ought to put something in there for that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments