Ticket #68 (assigned defect)

Opened 1 year ago

Last modified 8 months ago

is buildbot slave timeout too short?

Reported by: joduinn Assigned to: warner (accepted)
Priority: major Milestone: 0.8.0
Component: configuration Version: 0.7.5
Keywords: Cc: joduinn, bhearsum@mozilla.com

Description

The buildbot slave default timeout can be too short for us. Sometimes, 5sec isnt always enough, and we get following:

% cltbld$ buildbot start /Users/cltbld/macosx-slave1

Following twistd.log until startup finished.. The buildmaster took more than 5 seconds to start, so we were unable to confirm that it started correctly. Please 'tail twistd.log' and look for a line that says 'configuration update complete' to verify correct startup.

%

... even though slave started "normally", responds fine to pings from buildbot master, and does handle jobs just fine.

Change History

07/30/07 18:29:24 changed by warner

  • component changed from buildprocess to configuration.

could you take a look at your logs and estimate how much time it did take to startup? Since twistd doesn't record seconds in the logfiles, you'll have to do this with 'tail -f' and a stopwatch (or some clever programming): measure the elapsed time between the "Loading buildbot.tac" line and the "configuration update complete" lines.

If it's less than 10 or 15 seconds, I'll just bump up the timeout. If it's more than that, I'd be inclined to add a --timeout option to the 'buildbot start' command (and restart and reconfig), since I want to provide earlier feedback about broken startups in the most common case.

And if it is slow for your buildmaster, any idea what's taking so long? It shouldn't be reading any status from disk or interacting with buildslaves at all during startup, so the time it takes should be linear with the complexity of your configuration and with the speed of your machine. Is there something weird going on that's making it slower than usual?

09/29/07 01:06:08 changed by joduinn

  • cc set to joduinn.

09/29/07 01:12:08 changed by joduinn

1) Changing timeout to 10seconds for this 0.7.6 release. Lets see if that makes things better.

2) Found a bug in how the error message is generated. It parses the newly-generated logfiles for "Creating BuildSlave". If found, assumes running a slave and if not found, assumes running on a master. However, we can see situations where no logs are yet present on a slave, and this logic incorrectly determines this to be a master. This is not fixed in 0.7.6.

09/29/07 01:41:26 changed by warner

  • milestone changed from undecided to 0.7.6.

09/29/07 15:35:28 changed by warner

  • status changed from new to assigned.
  • milestone changed from 0.7.6 to 0.7.7.

ok, I've bumped the timeout to 10 seconds, in [458]. I'll push the rest of this issue to the next release.

12/28/07 00:56:05 changed by warner

  • milestone changed from 0.7.7 to 0.7.8.

no progress on this yet, bumping to 0.7.8

03/19/08 10:14:09 changed by bhearsum

  • cc changed from joduinn to joduinn, bhearsum@mozilla.com.