On 4/11/2013 11:26, Amal Thomas wrote: > @Dave: thanks.. By the way I am running my codes on a server with about > 100GB ram but I cant afford my code to use 4-5 times the size of the text > file. Now I am using read() / readlines() , these seems to be more > efficient in memory usage than io.StringIO(f.read()). >
Sorry I misspoke about read() on a large file. I was confusing it with something else. However, note that in any environment if you have a large buffer, and you force the system to copy that large buffer, you'll be using (temporarily at least) twice the space. And usually the original can't be freed, for various technical reasons. The real question is how you're going to be addressing the data, and wha.t constraints are on that data. Since you think you need it all in memory, you clearly are planning to access it randomly. Since the data is apparently ASCII characters, and you're running at least 3.3, you won't be paying the penalty if it turns out to be strings. But there may be alternate ways of encoding each line which save space and/or make it faster to use. One big buffer imaging the file is likely to be one of the worst. Are the lines variable length? Do you ever deal randomly with a portion of a line, or only the whole thing? If the line is multiple ASCII characters, are their order significant? how many different symbols can appear in a single line? how many different ones total? (probably excluding the newline). What's the average line length? Each of these questions may lead to exploring different optimzation strategies. But I've done enough speculating. -- DaveA _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor