On Dec 6, 2008, at 12:41 AM, Lie Ryan wrote:

In most cases, in processing involving networking, the bottleneck is the
network speed itself. To speed things up by optimizing your own code
might not make your download significantly faster (getting 60 seconds
faster is great for scripts that usually runs for 70 seconds, but is a
waste of development time for scripts that usually run for 1 hour)

Usually a multi-threading downloader might be a better chance to
improvement, especially for 1) downloading from different site, 2) the remote sites have speed limit, 3) you have faster download link than the
server can gives


In this particular case everything is on the local network. This is actually part of a hadoop map/reduce system I am learning, so reducing cpu is of high value. if network pull times become and issue the cluster can be expanded and the time between pulls can be reduced. As of this morning I am being directed to make the reducer usable both in the mapper and then again as a reducer. This has forced me to rework everything to work so that it can be called as a module.

I have never learned java so that wasn't' an option and the more I am working with it python seems to be the perfect fit for hadoop type work. Really fun stuff.

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to