I don't understand this statement. Basic page rank in map-reduce is
normally a simple undergraduate class assignment:
http://www.ics.uci.edu/~abehm/class.../uci/.../Behm-Shah_PageRank.ppt
http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/exercises/pagerank.html
What is it about your problem that
Do you want random access for web presentation? What is your required
update time? What about search index delay?
Or batch sequential access for large scale computation like pageRank?
These are very different answers.
The first is likely to be a standard sharded profile database with
Hi David,
I'm unaware of any issue that would cause memory leaks when a file is open
for read for a long time.
There are some issues currently with write pipeline recovery when a file is
open for writing for a long time and the datanodes to which it's writing
fail. So, I would not recommend
Anyway why would it slow things down if it converges let's say 100 times
faster (in terms of iterations) and you are able to have a memcached or
whatever shared system (Voldemort) which is equal to the number of MR hosts
i.e. a memcached server on each one of them ?
I understand what you are