On Mon, Nov 04, 2013 at 11:27:52AM -0500, Joel Goldstick wrote: > If you are new to python why are you so concerned about the speed of > your code.
Amal is new to Python but he's not new to biology, he's a 4th year student. With a 50GB file, I expect he is analysing something to do with DNA sequencing, which depending on exactly what he is trying to do could involve O(N) or even O(N**2) algorithms. An O(N) algorithm on a 50GB file, assuming 100,000 steps per second, will take over 5 days to complete. An O(N**2) algorithm, well, it's nearly unthinkable: nearly 800 million years. You *really* don't want O(N**2) algorithms with big data. I would expect that with a big DNA sequencing problem, running time would be measured in days rather than minutes or hours. So yes, this is probably a case where optimizing for speed is not premature. We really don't know enough about his problem to advise him on how to speed it up. If the data file is guaranteed to be nothing but GCTA bases, and newlines, it may be better to read the data file into memory as a bytearray rather than a string. Especially if he needs to modify it in place. But this is getting into some fairly advanced territory, I wouldn't like to predict what will be faster without testing on real data. -- Steven _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor