Ticket #26 (new enhancement)

Opened 2 years ago

Last modified 4 months ago

compress buildstep logs

Reported by: warner Assigned to: warner
Priority: minor Milestone: 0.9.0
Component: statusplugins Version:
Keywords: Cc: s.lipnevich@gmail.com, pw@openplans.org, dustin, Pike

Description

Darragh Bailey suggests that the buildstep logfiles could be compressed on disk to good effect. This is probably not too difficult to implement: basically in LogFile.init we'd use gzip.open() instead of the regular open(). The tricky part would be compatibility with existing logfiles: this might require a special extension for the new compressed logs, which would require checking for both kinds of filenames when looking for logs.

Attachments

buildbot-gzip.patch (2.6 kB) - added by slinkp on 05/20/08 14:43:37.
first attempt at gzip support patch
buildbot-gzip2.patch (3.7 kB) - added by slinkp on 05/20/08 15:57:18.
second attempt, more complex, still many test failures

Change History

(follow-up: ↓ 2 ) 03/05/07 17:31:20 changed by warner

Darragh had another good idea: just compress the logfile after it is closed, rather than trying to do something incremental. When looking for a logfile, take the .gz form if present, otherwise fall back to the non-.gz form.

There might be a tricky bit w.r.t. windows buildmasters, if someone is trying to read from the status file while we're writing the compressed form, but I have a feeling that this problem either doesn't actually exist or is completely masked by some other problem.

(in reply to: ↑ 1 ) 06/22/07 08:12:45 changed by bear

Replying to warner:

Darragh had another good idea: just compress the logfile after it is closed, rather than trying to do something incremental. When looking for a logfile, take the .gz form if present, otherwise fall back to the non-.gz form. There might be a tricky bit w.r.t. windows buildmasters, if someone is trying to read from the status file while we're writing the compressed form, but I have a feeling that this problem either doesn't actually exist or is completely masked by some other problem.

Could this be as simple as changing line 236 to read:

self.openfile = gzip.open(fn, 'w+')

and line 269:

return gzip.open(self.getFilename(), "r")

within buildbot/status/builder.py ?

the gzip.open() returns a fully functional file handle so it should be able to replace the two existing open() calls.

07/02/07 09:50:38 changed by warner

possibly. If gzip.open() is given a non-gzipped file, does it fall back to behaving just like regular open()? (we need to keep working with old logfiles). If not, it might be sufficient to wrap the gzip.open()s in a try: block that falls back to regular open() in case of some hypothetical NotAGZipFileError exception.

Also, it would be nice if the files on disk were obviously compressed, but since they aren't entirely plain text that probably isn't too big of a deal.

There are two other open() calls, at lines 269 (LogFile?.getFile) and 450 (LogFile?.upgrade) that need the same treatment.

Could somebody see if the unit tests pass with this change? Note that we must still manually test the handle-old-non-compressed-logfiles properly, since there are no unit tests to verify this. If it works, we should write a unit test to check the backwards-compatibility (by creating some logfiles, closing them, sneaking around and uncompressing one of them, then reading their contents back).

10/01/07 13:04:33 changed by warner

  • milestone set to 0.8.0.

10/01/07 13:07:31 changed by warner

  • milestone changed from 0.8.0 to 0.9.0.

12/02/07 12:36:48 changed by sergey

  • cc set to s.lipnevich@gmail.com.

05/14/08 16:42:18 changed by slinkp

  • cc changed from s.lipnevich@gmail.com to s.lipnevich@gmail.com, pw@openplans.org.

I would love to see this feature. gzip.open() claims to work on non-gzipped files, but you get an IOError when you call its read method. So you may want something like:

>>> def open2(fname):
...     f = gzip.open(fname)
...     try:
...          f.read(1)
...          f.seek(0)
...          return f
...     except IOError:
...          del f
...          return open(fname)
... 
>>> open2('stdio2.gz').read() == open2('stdio').read()
True

05/15/08 08:17:34 changed by dustin

  • cc changed from s.lipnevich@gmail.com, pw@openplans.org to s.lipnevich@gmail.com, pw@openplans.org, dustin.

05/15/08 15:57:00 changed by dustin

Let's someone post a patch so that others can improve on it..

05/15/08 23:24:14 changed by Pike

  • cc changed from s.lipnevich@gmail.com, pw@openplans.org, dustin to s.lipnevich@gmail.com, pw@openplans.org, dustin, Pike.

05/20/08 14:43:37 changed by slinkp

  • attachment buildbot-gzip.patch added.

first attempt at gzip support patch

05/20/08 15:26:44 changed by slinkp

Patch against current CVS attached; it doesn't work, lots of tests fail due to calls like f.seek(0, 2) which gzip files don't support.

I'm gonna leave it at that; anybody else, feel free to find a solution.

(follow-up: ↓ 15 ) 05/20/08 15:53:58 changed by slinkp

I played around with another patch, but there are two problems that look insurmountable without totally rethinking the logging system:

- GzipFile? doesn't support seeking backward when opened in write mode; eg. you can never seek(0) if you've already seeked to any positive integer.

- GzipFile? doesn't really support "w+" or "r+" mode; it pretends to, but in "w+" mode it raises errors if you call read(), and in "r+" mode it raises errors if you call write().

05/20/08 15:57:18 changed by slinkp

  • attachment buildbot-gzip2.patch added.

second attempt, more complex, still many test failures

05/20/08 15:58:11 changed by slinkp

I should note that I have only python 2.4 handy. Is the gzip module any more compatible with normal file objects in 2.5?

05/20/08 16:47:23 changed by Pike

Here is what I was thinking:

Keep the logfile structure as it is, and just glue the chunks together and gzip or bz2 them, and to do that after the log is completed.

Not exactly sure if there's a trap to go into if the logs are actually already compressed data themselves, at that point, autodetecting by just trying wouldn't work.

Regarding python versions, I think that buildbot still aims at being compatible with python 2.3.

(in reply to: ↑ 12 ) 08/05/08 10:56:04 changed by bbl

Replying to slinkp:

I played around with another patch, but there are two problems that look insurmountable without totally rethinking the logging system: - GzipFile? doesn't support seeking backward when opened in write mode; eg. you can never seek(0) if you've already seeked to any positive integer.

This is obvious beacause seeking backwards and writing something to that place would force to recompress that whole file starting from the position you seeked to - but this seems not to be implemented in GzipFile? because it would probably force duplicating the file opened which is very ineffective

- GzipFile? doesn't really support "w+" or "r+" mode; it pretends to, but in "w+" mode it raises errors if you call read(), and in "r+" mode it raises errors if you call write().

This also seems to be very ineffective to implement, too.

Therefore you should probably write the logs the way you do now and create a compressed version of them (and delete the uncompressed one) on closing the log...

Another idea is to save the compressed logs with .gz/.bz2 extension so that you can easily distinguish if a log is compressed or not because if not, as Pike said, in very rare cases, an uncompressed log might be recognised as a compressed log...