Ticket #192 (new defect)

Opened 9 months ago

Last modified 9 months ago

Buildslave fails to timely detect command completion and hangs for 10-20 minutes

Reported by: hans Assigned to:
Priority: major Milestone: undecided
Component: buildprocess Version: 0.7.6
Keywords: Cc:

Description

I am running buildbot 0.7.6 on FreeBSD 6.3 with Python 2.5.1.

buildslave often fails to detect that a command that it started has completed in time. In these cases, it takes about 20 minutes until buildslave continues:

2008/02/24 21:28 +0200 [Broker,client] SlaveBuilder.remote_print(bknr-fbsd-ccl-amd64): message from master: ping
2008/02/24 21:28 +0200 [Broker,client] SlaveBuilder.remote_ping(<SlaveBuilder 'bknr-fbsd-ccl-amd64' at 16127976>)
2008/02/24 21:28 +0200 [Broker,client] <SlaveBuilder 'bknr-fbsd-ccl-amd64' at 16127976>.startBuild
2008/02/24 21:28 +0200 [Broker,client]  startCommand:svn [id 46]
2008/02/24 21:28 +0200 [Broker,client] ShellCommand._startCommand
2008/02/24 21:28 +0200 [Broker,client]  /usr/local/bin/svn update --revision HEAD --non-interactive
2008/02/24 21:28 +0200 [Broker,client]   in dir /home/buildslave/builds/bknr-fbsd-ccl-amd64/build (timeout 1200 secs)
2008/02/24 21:28 +0200 [Broker,client]   watching logfiles {}
2008/02/24 21:28 +0200 [Broker,client]   argv: ['/usr/local/bin/svn', 'update', '--revision', 'HEAD', '--non-interactive']
2008/02/24 21:28 +0200 [Broker,client]  environment: {'USERNAME': 'buildslave', 'SUDO_COMMAND': '/usr/local/bin/buildbot start /home/buildslave/builds/', 'TERM': 'xterm', 'SHELL': '/bin/t
csh', 'MAIL': '/var/mail/hans', 'SUDO_UID': '1000', 'SUDO_GID': '1000', 'LOGNAME': 'buildslave', 'USER': 'buildslave', 'HOME': '/home/hans', 'PATH': '/home/hans/bin:/sbin:/bin:/usr/sbin:/
usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/usr/X11R6/bin:/home/hans/bin', 'SUDO_USER': 'hans', 'DISPLAY': 'localhost:10.0', 'TMPDIR': '/tmp'}
2008/02/24 21:48 +0200 [-] command finished with signal None, exit code 0

In this logfile example, the process started exits after about 30 seconds, yet the "command finished" log entry is shown 20 minutes later. The process spawned is in Zombie state until it is eventually collected.

The problem could be related to http://twistedmatrix.com/trac/ticket/791 - If there is a workaround for buildslave, I'd happily use that.

Change History

03/06/08 01:36:21 changed by hans

I am sometimes seeing the problem with Linux, too - It does not seem to be FreeBSD specific.

The workaround I found was to set the keepaliveInterval and keepaliveTimeout parameters to rather low values (like 10/5 seconds) in the buildslave configuration.