Re: Parallell maps

2009-07-03 Thread Ted Dunning
I don't understand this statement. Basic page rank in map-reduce is normally a simple undergraduate class assignment: http://www.ics.uci.edu/~abehm/class.../uci/.../Behm-Shah_PageRank.ppt http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/exercises/pagerank.html What is it about your problem that

Re: Parallell maps

2009-07-03 Thread Ted Dunning
Do you want random access for web presentation? What is your required update time? What about search index delay? Or batch sequential access for large scale computation like pageRank? These are very different answers. The first is likely to be a standard sharded profile database with

Re: HDFS and long-running processes

2009-07-03 Thread Todd Lipcon
Hi David, I'm unaware of any issue that would cause memory leaks when a file is open for read for a long time. There are some issues currently with write pipeline recovery when a file is open for writing for a long time and the datanodes to which it's writing fail. So, I would not recommend

Re: Parallell maps

2009-07-03 Thread Marcus Herou
Anyway why would it slow things down if it converges let's say 100 times faster (in terms of iterations) and you are able to have a memcached or whatever shared system (Voldemort) which is equal to the number of MR hosts i.e. a memcached server on each one of them ? I understand what you are