Ticket #176 (new defect)

Opened 10 months ago

Last modified 3 months ago

'buildbot reconfig' causes WebStatus to give tracebacks for awhile

Reported by: bhearsum Assigned to:
Priority: major Milestone: undecided
Component: statusplugins-web Version: 0.7.6
Keywords: Cc: dustin, thatch, ijon

Description

After doing a reconfig, even one that doesn't change anything, WebStatus? stops working for a few minutes. It then magically starts working again. There's nothing in the log to indicate how it recovered. Here's the traceback:

File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/web/server.py", line 160, in process

self.render(resrc)

File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/web/server.py", line 167, in render

body = resrc.render(self)

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 210, in render

data = self.content(request)

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 245, in content

data += self.fillTemplate(s.header, request)

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 239, in fillTemplate

valuestitle? = self.getTitle(request)

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/waterfall.py", line 417, in getTitle

status = self.getStatus(request)

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/base.py", line 220, in getStatus

return request.site.buildbot_service.getStatus()

File "/tools/buildbot/lib/python2.5/site-packages/buildbot/status/web/baseweb.py", line 458, in getStatus

return self.parent.getStatus()

<type 'exceptions.AttributeError?'>: 'NoneType?' object has no attribute 'getStatus'

Change History

02/06/08 13:01:23 changed by bhearsum

It turns out that I can't consistently reproduce this. It only seems to happen with one of my Buildbots.

02/17/08 10:05:07 changed by dustin

  • cc set to dustin.

The reconfig operation is pretty dark magic. It involves divorcing 'old' objects from the object graph, but if they are still in use (e.g., by the web), then problems will ensue. In this case, for example, the web service has been divorced from its parent service. I'm not sure there's a good fix for this problem.

03/12/08 06:23:35 changed by thatch

  • cc changed from dustin to dustin, thatch.

03/18/08 17:42:07 changed by ijon

  • cc changed from dustin, thatch to dustin, thatch, ijon.

Identical to #139.

04/11/08 07:20:12 changed by bhearsum

I've now noticed that if I reload a ton of times (by holding down the keyboard shortcut for 'reload') - it comes back immediately.

04/28/08 15:49:30 changed by warner

I'm seeing this a lot at work too.

07/02/08 02:56:38 changed by dbailey

I get this relatively consistently.

The most recent occurrence was when I updated the master.cfg file to change the FileUpload? step on the 3 builders defined to use a WithProperties? to set the filename.

Only solution in most of the cases I encounter is to complete restart the buildbot master.

08/20/08 06:53:44 changed by dustin

I see this too. My theory is that my browser is using HTTP/1.1 with connection caching, and I'm still connected to the old status object. I'm not sure there's a good solution to this.

08/21/08 03:08:27 changed by dbailey

cache-Control directive may solve the problem.

http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html Look for 14.9

I haven't checked to see if the necessary HTTP headers can be set by buildbot, but since it's using twisted for its own web server, I'm assuming it should be possible.

I haven't read the options in detail to see if there is a nice option to inform browsers that they should ignore any cached output prior to a given time/date (i.e update that value after any reconfig).

The alternative is to request the browser to disable caching.

08/21/08 06:17:12 changed by dustin

hmm, I don't like the idea of disabling connection caching altogether just to fix this bug. If anything, this is a bug in twisted -- not terminating existing connections when the service is shut down.

Another solution may be to delay removing the old WebStatus? object from the service hierarchy for some longish time like 5 minutes.