The new WebStatus? page gets rebuilt on each reconfig (because it's too hard
to figure out whether it's changed or not). This has the unfortunate
side-effect of exposing a problem with cached (persistent) connections: any
browser which was talking to the WebStatus? before the reconfig will continue
to talk to the *old* WebStatus?, and won't see the new one. This problem will
persist until either the server times out the persistent connection
(twisted.web does this after 12 hours), or the browser decides to drop the
connection on its own (from 2 to 5 minutes, in my experiments).
To deal with this, I'm adding some code to the web-page classes that keep
track of all the HTTPChannels that have been used by a given WebStatus? object
(using a WeakKeyDictionary?). When the WebStatus? shuts down (because it's
being replaced by a new one), it goes through the list and kills those
connections first.
The rest of this ticket hold my random notes on this topic.
browsers will cache connections, and if we've recently reloaded the
config file, a browser might still be talking to the previous Site,
which will work for some things, but will break when they try to
reach through our now-empty .parent attribute (usually via
HtmlResource?.getStatus(), which does
request.site.buildbot_service.parent). This results in a big ugly
exception on pretty much any page in the following situation: the
browser hits a buildbot page, then the buildbot is reconfigured,
then the user tells the browser to reload (or hit another page on
the same buildmaster). The fact that we use a new WebStatus?
instance for every reconfig (not just those which modify the
WebStatus? parameters) makes this even worse.
The most annoying thing about this is when you're hacking on your
config and want to see the changes you've just made.
The connection will be kept open until either the server or the
browser decides to close it. My copy of firefox appears to keep it
alive for about two minutes. The twisted.web.server.Site (an
HTTPFactory subclass) sets a server-side timeout, which drops the
connection if it has been up for more than 12 hours.
Unfortunately, the factory doesn't keep a reference to the
HTTPChannels that it creates, so we don't have anything to track
down and break at reconfig time (unless we were willing to use
gc.get_referrers() on the WebStatus?.site that we just removed, and
I'm not).
I can think of the following ways to deal with this:
- keep the old .parent link alive, allowing the cached connection
to continue to work. However, if the reconfig action was to change
the WebStatus?, the browser will continue to show the old behavior,
which will be confusing and annoying.
- use gc.get_referrers() on the old WebStatus?.site to find all
the HTTPChannels that refer to it, and force them to shut down
their connections. Not likely.
- subclass HTTPChannel and override checkPersistence() to disable
persistent connections entirely. Seems heavy-handed.
- in render(), use request.channel.persistent=False . Also
heavy-handed. I'm ok with persistence, as long as it stops when we
shut down the WebStatus?.
- lower the HTTPFactory timeout from 12 hours to more like 30
seconds. It is important to make the timeout longer than it takes
to render any actual page, since the timeout will sever the
connection even if it is still in use by page rendering.
I'm going to use weakrefs to allow the WebStatus? to keep track of all the
channels that are still open, and then have its stopService() method shut
them all down.