Thanks Walter and Steven for the insight. I guess I will post my question to python main mailing list and see if people have anything to say.
-Abhi On Mon, Mar 26, 2012 at 3:28 PM, Walter Prins <wpr...@gmail.com> wrote: > Abhi, > > On 26 March 2012 19:05, Abhishek Pratap <abhishek....@gmail.com> wrote: >> I want to utilize the power of cores on my server and read big files >> (> 50Gb) simultaneously by seeking to N locations. Process each >> separate chunk and merge the output. Very similar to MapReduce >> concept. >> >> What I want to know is the best way to read a file concurrently. I >> have read about file-handle.seek(), os.lseek() but not sure if thats >> the way to go. Any used cases would be of help. > > Your idea won't work. Reading from disk is not a CPU-bound process, > it's an I/O bound process. Meaning, the speed by which you can read > from a conventional mechanical hard disk drive is not constrained by > how fast your CPU is, but generally by how fast your disk(s) can read > data from the disk surface, which is limited by the rotation speed and > areal density of the data on the disk (and the seek time), and by how > fast it can shovel the data down it's I/O bus. And *that* speed is > still orders of magnitude slower than your RAM and your CPU. So, in > reality even just one of your cores will spend the vast majority of > its time waiting for the disk when reading your 50GB file. There's > therefore __no__ way to make your file reading faster by increasing > your __CPU cores__ -- the only way is by improving your disk I/O > throughput. You can for example stripe several hard disks together in > RAID0 (but that increases the risk of data loss due to data being > spread over multiple drives) and/or ensure you use a faster I/O > subsystem (move to SATA3 if you're currently using SATA2 for example), > and/or use faster hard disks (use 10,000 or 15,000 RPM instead of > 7,200, or switch to SSD [solid state] disks.) Most of these options > will cost you a fair bit of money though, so consider these thoughts > in that light. > > Walter _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor