Re: [Tutor] multiprocessing question

Dave Angel Thu, 27 Nov 2014 14:58:17 -0800

On 11/27/2014 04:01 PM, Albert-Jan Roskam wrote:






I made a comparison between multiprocessing and threading.  In the code below 
(it's also here: http://pastebin.com/BmbgHtVL, multiprocessing is more than 100 
(yes: one hundred) times slower than threading! That is 
I-must-be-doing-something-wrong-ishly slow. Any idea whether I am doing 
something wrong? I can't believe the difference is so big.

The bulk of the time is spent marshalling the data to the dictionaryself.lookup. You can speed it up some by using a list there (it alsomakes the code much simpler). But the real trick is to communicate lessoften between the processes.


    def mp_create_lookup(self):
        local_lookup = []
        lino, record_start = 0, 0
        for line in self.data:
            if not line:
                break
            local_lookup.append(record_start)
            if len(local_lookup) > 100:
                self.lookup.extend(local_lookup)
                local_lookup = []
            record_start += len(line)
        print(len(local_lookup))
        self.lookup.extend(local_lookup)

It's faster because it passes a larger list across the boundary every100 records, instead of a single value every record.

Note that the return statement wasn't ever needed, and you don't need alino variable. Just use append.

I still have to emphasize that record_start is just wrong. You must useftell() if you're planning to use fseek() on a text file.

You can also probably speed the process up a good deal by passing thefilename to the other process, rather than opening the file in theoriginal process. That will eliminate sharing the self.data across theprocess boundary.




--
DaveA
_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] multiprocessing question

Reply via email to