Re: CRC-checksum failed in gzip

2012-08-02 Thread Ulrich Eckhardt
Am 01.08.2012 19:57, schrieb Laszlo Nagy: ## Open file lock = threading.Lock() fin = gzip.open(file_path...) # Now you can share the file object between threads. # and do this inside any thread: ## data needed. block until the file object becomes usable. with lock: data = fin.read() #

Re: CRC-checksum failed in gzip

2012-08-02 Thread andrea crotti
2012/8/1 Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: When you start using threads, you have to expect these sorts of intermittent bugs unless you are very careful. My guess is that you have a bug where two threads read from the same file at the same time. Since each read shares

Re: CRC-checksum failed in gzip

2012-08-02 Thread Laszlo Nagy
Technically, that is correct, but IMHO its complete nonsense to share the file object between threads in the first place. If you need the data in two threads, just read the file once and then share the read-only, immutable content. If the file is small or too large to be held in memory at

Re: CRC-checksum failed in gzip

2012-08-02 Thread Laszlo Nagy
One last thing I would like to do before I add this fix is to actually be able to reproduce this behaviour, and I thought I could just do the following: import gzip import threading class OpenAndRead(threading.Thread): def run(self): fz = gzip.open('out2.txt.gz')

Re: CRC-checksum failed in gzip

2012-08-02 Thread andrea crotti
2012/8/2 Laszlo Nagy gand...@shopzeus.com: Your example did not share the file object between threads. Here an example that does that: class OpenAndRead(threading.Thread): def run(self): global fz fz.read(100) if __name__ == '__main__': fz =

Re: CRC-checksum failed in gzip

2012-08-02 Thread andrea crotti
2012/8/2 andrea crotti andrea.crott...@gmail.com: Ok sure that makes sense, but then this explanation is maybe not right anymore, because I'm quite sure that the file object is *not* shared between threads, everything happens inside a thread.. I managed to get some errors doing this with a

CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
We're having some really obscure problems with gzip. There is a program running with python2.7 on a 2.6.18-128.el5xen (red hat I think) kernel. Now this program does the following: if filename == 'out2.txt': out2 = open('out2.txt') elif filename == 'out2.txt.gz' out2 =

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
On 2012-08-01 12:39, andrea crotti wrote: We're having some really obscure problems with gzip. There is a program running with python2.7 on a 2.6.18-128.el5xen (red hat I think) kernel. Now this program does the following: if filename == 'out2.txt': out2 = open('out2.txt') elif filename

Re: CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
2012/8/1 Laszlo Nagy gand...@shopzeus.com: On 2012-08-01 12:39, andrea crotti wrote: We're having some really obscure problems with gzip. There is a program running with python2.7 on a 2.6.18-128.el5xen (red hat I think) kernel. Now this program does the following: if filename ==

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
very simple right? But sometimes we get a checksum error. Do you have a traceback showing the actual error? - CRC is at the end of the file and is computed against the whole file (last 8 bytes) - after the CRC there is the \ marker for the EOF - readline() doesn't trigger the

Re: CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
Full traceback: Exception in thread Thread-8: Traceback (most recent call last): File /user/sim/python/lib/python2.7/threading.py, line 530, in __bootstrap_inner self.run() File /user/sim/tests/llif/AutoTester/src/AutoTester2.py, line 67, in run self.processJobData(jobData, logger)

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
- The file is written with the linux gzip program. - no I can't reproduce the error with the same exact file that did failed, that's what is really puzzling, How do you make sure that no process is reading the file before it is fully flushed to disk? Possible way of testing for this kind of

Re: CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
2012/8/1 Laszlo Nagy gand...@shopzeus.com: there seems to be no clear pattern and just randmoly fails. The file is also just open for read from this program, so in theory no way that it can be corrupted. Yes, there is. Gzip stores CRC for compressed *blocks*. So if the file is not

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
Thanks a lotl, someone that writes on the file while reading might be an explanation, the problem is that everyone claims that they are only reading the file. If that is true, then make that file system read only. Soon it will turn out who is writing them. ;-) Apparently this file is

Re: CRC-checksum failed in gzip

2012-08-01 Thread Steven D'Aprano
On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote: Full traceback: Exception in thread Thread-8: DANGER DANGER DANGER WILL ROBINSON!!! Why didn't you say that there were threads involved? That puts a completely different perspective on the problem. I *was* going to write back and

Re: CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
2012/8/1 Steven D'Aprano steve+comp.lang.pyt...@pearwood.info: On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote: Full traceback: Exception in thread Thread-8: DANGER DANGER DANGER WILL ROBINSON!!! Why didn't you say that there were threads involved? That puts a completely

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
Thanks a lot, that makes a lot of sense.. I haven't given this detail before because I didn't write this code, and I forgot that there were threads involved completely, I'm just trying to help to fix this bug. Your explanation makes a lot of sense, but it's still surprising that even just

Re: CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
2012/8/1 Laszlo Nagy gand...@shopzeus.com: Thanks a lot, that makes a lot of sense.. I haven't given this detail before because I didn't write this code, and I forgot that there were threads involved completely, I'm just trying to help to fix this bug. Your explanation makes a lot of sense,

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
Make sure that file objects are not shared between threads. If that is possible. It will probably solve the problem (if that is related to threads). Well I just have to create a lock I guess right? That is also a solution. You need to call file.read() inside an acquired lock. with lock: