On 02/27/2013 01:42 PM, Adam Phelps wrote:
We have a job that uses a large lookup structure that gets created as a
static class during the map setup phase (and we have the JVM reused so
this only takes place once). However of late this structure has grown
drastically (due to items beyond our con
Have you looked at things like CDB http://cr.yp.to/cdb.html that would
allow you to keep most of the file on disk and cache hot parts in memory.
That really depends on your access pattern.
Alternatively you could give yourself more heap and take up two slots for
your map task.
Also if it is big e
We actually use CDBs a good bit outside of M/R. This is something worth
looking into, but the big structure we're currently using is a giant
tree-based lookup table whose access pattern is pretty random, so I
don't think caching would be of much use. There is a lesser (but still
large) structure