On Mon, Nov 04, 2013 at 07:00:29PM +0530, Amal Thomas wrote: > Yes I have found that after loading to RAM and then reading lines by lines > saves a huge amount of time since my text files are very huge.
This is remarkable, and quite frankly incredible. I wonder whether you are misinterpreting what you are seeing? Under normal circumstances, with all but quite high-end machines, trying to read a 50GB file into memory all at once will be effectively impossible. Suppose your computer has 24GB of RAM. The OS and other running applications can be expected to use some of that, but even ignoring this, it is impossible to read a 50GB file into memory all at once with only 24GB. What I would expect is that unless you have *at least* double the amount of memory as the size of the file (in this case, at least 100GB), either Python will give you a MemoryError, or the operating system will try paging memory into swap-space, which is *painfully* slooooow. I've been in the situation where I accidently tried reading a file bigger than the installed RAM, and it ran overnight (14+ hours), locked up and stopped responding, and I finally had to unplug the power and restart the machine. So unless you have 100+ GB in your computer, which would put it in seriously high-end server class, I find it difficult to believe that you are actually reading the entire file into memory. Please try this little bit of code, replacing the file name with the actual name of your 50GB data file: import os filename = "YOUR FILE NAME HERE" print("File size:", os.stat(filename).st_size) f = open(filename) content = f.read() print("Length of content actually read:", len(content)) print("Current file position:", f.tell()) f.close() and send us the output. Thanks, -- Steven _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor