On Jul 18, 2012, at 10:33 PM, Ryan Waples wrote:

> Thanks for the replies, I'll try to address the questions raised and
> spur further conversation.
> 
>> "those numbers (4GB and 64M lines) look suspiciously close to the file and 
>> record pointer limits to a 32-bit file system.  Are you sure you aren't 
>> bumping into wrap around issues of some sort?"
> 
> My understanding is that I am taking the files in a stream, one line
> at a time and never loading them into memory all at once.  I would
> like (and expect) my script to be able to handle files up to at least
> 50GB.  If this would cause a problem, let me know.

[Again, stripping out everything else…]

I don't think you understood my concern.  The issue isn't whether or not the 
files are being read as a stream, the issue is that at something like those 
numbers a 32-bit file system can silently fail.  If the pointers that are 
chaining allocation blocks together (or whatever Windows calls them) aren't 
capable of indexing to sufficiently large numbers, then you WILL get garbage 
included in the file stream.

If you copy those files to a different device (one that has just been scrubbed 
and reformatted), then copy them back and get different results with your 
application, you've found your problem.

-Bill
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to