Re: CRC-checksum failed in gzip

2012-08-02 Thread andrea crotti
2012/8/2 andrea crotti : > > Ok sure that makes sense, but then this explanation is maybe not right > anymore, because I'm quite sure that the file object is *not* shared > between threads, everything happens inside a thread.. > > I managed to get some errors doing this with a big file > class Open

Re: CRC-checksum failed in gzip

2012-08-02 Thread andrea crotti
2012/8/2 Laszlo Nagy : > > Your example did not share the file object between threads. Here an example > that does that: > > class OpenAndRead(threading.Thread): > def run(self): > global fz > fz.read(100) > > if __name__ == '__main__': > >fz = gzip.open('out2.txt.gz') >

Re: CRC-checksum failed in gzip

2012-08-02 Thread Laszlo Nagy
One last thing I would like to do before I add this fix is to actually be able to reproduce this behaviour, and I thought I could just do the following: import gzip import threading class OpenAndRead(threading.Thread): def run(self): fz = gzip.open('out2.txt.gz') fz.rea

Re: CRC-checksum failed in gzip

2012-08-02 Thread Laszlo Nagy
Technically, that is correct, but IMHO its complete nonsense to share the file object between threads in the first place. If you need the data in two threads, just read the file once and then share the read-only, immutable content. If the file is small or too large to be held in memory at onc

Re: CRC-checksum failed in gzip

2012-08-02 Thread andrea crotti
2012/8/1 Steven D'Aprano : > > When you start using threads, you have to expect these sorts of > intermittent bugs unless you are very careful. > > My guess is that you have a bug where two threads read from the same file > at the same time. Since each read shares state (the position of the file >

Re: CRC-checksum failed in gzip

2012-08-02 Thread Ulrich Eckhardt
Am 01.08.2012 19:57, schrieb Laszlo Nagy: ## Open file lock = threading.Lock() fin = gzip.open(file_path...) # Now you can share the file object between threads. # and do this inside any thread: ## data needed. block until the file object becomes usable. with lock: data = fin.read() # o

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
Make sure that file objects are not shared between threads. If that is possible. It will probably solve the problem (if that is related to threads). Well I just have to create a lock I guess right? That is also a solution. You need to call file.read() inside an acquired lock. with lock:

Re: CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
2012/8/1 Laszlo Nagy : > >> Thanks a lot, that makes a lot of sense.. I haven't given this detail >> before because I didn't write this code, and I forgot that there were >> threads involved completely, I'm just trying to help to fix this bug. >> >> Your explanation makes a lot of sense, but it's

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
Thanks a lot, that makes a lot of sense.. I haven't given this detail before because I didn't write this code, and I forgot that there were threads involved completely, I'm just trying to help to fix this bug. Your explanation makes a lot of sense, but it's still surprising that even just read

Re: CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
2012/8/1 Steven D'Aprano : > On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote: > >> Full traceback: >> >> Exception in thread Thread-8: > > "DANGER DANGER DANGER WILL ROBINSON!!!" > > Why didn't you say that there were threads involved? That puts a > completely different perspective on the p

Re: CRC-checksum failed in gzip

2012-08-01 Thread Steven D'Aprano
On Wed, 01 Aug 2012 14:01:45 +0100, andrea crotti wrote: > Full traceback: > > Exception in thread Thread-8: "DANGER DANGER DANGER WILL ROBINSON!!!" Why didn't you say that there were threads involved? That puts a completely different perspective on the problem. I *was* going to write back an

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
Thanks a lotl, someone that writes on the file while reading might be an explanation, the problem is that everyone claims that they are only reading the file. If that is true, then make that file system read only. Soon it will turn out who is writing them. ;-) Apparently this file is generat

Re: CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
2012/8/1 Laszlo Nagy : >>there seems to be no clear pattern and just randmoly fails. The file >> is also just open for read from this program, >>so in theory no way that it can be corrupted. > > Yes, there is. Gzip stores CRC for compressed *blocks*. So if the file is > not flushed to the d

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
- The file is written with the linux gzip program. - no I can't reproduce the error with the same exact file that did failed, that's what is really puzzling, How do you make sure that no process is reading the file before it is fully flushed to disk? Possible way of testing for this kind of e

Re: CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
Full traceback: Exception in thread Thread-8: Traceback (most recent call last): File "/user/sim/python/lib/python2.7/threading.py", line 530, in __bootstrap_inner self.run() File "/user/sim/tests/llif/AutoTester/src/AutoTester2.py", line 67, in run self.processJobData(jobData, logger)

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
very simple right? But sometimes we get a checksum error. Do you have a traceback showing the actual error? - CRC is at the end of the file and is computed against the whole file (last 8 bytes) - after the CRC there is the \ marker for the EOF - readline() doesn't trigger the check

Re: CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
2012/8/1 Laszlo Nagy : > On 2012-08-01 12:39, andrea crotti wrote: >> >> We're having some really obscure problems with gzip. >> There is a program running with python2.7 on a 2.6.18-128.el5xen (red >> hat I think) kernel. >> >> Now this program does the following: >> if filename == 'out2.txt': >>

Re: CRC-checksum failed in gzip

2012-08-01 Thread Laszlo Nagy
On 2012-08-01 12:39, andrea crotti wrote: We're having some really obscure problems with gzip. There is a program running with python2.7 on a 2.6.18-128.el5xen (red hat I think) kernel. Now this program does the following: if filename == 'out2.txt': out2 = open('out2.txt') elif filename ==

CRC-checksum failed in gzip

2012-08-01 Thread andrea crotti
We're having some really obscure problems with gzip. There is a program running with python2.7 on a 2.6.18-128.el5xen (red hat I think) kernel. Now this program does the following: if filename == 'out2.txt': out2 = open('out2.txt') elif filename == 'out2.txt.gz' out2 = open('out2.txt.gz'