This is a somewhat late announcement, but I thought it might be interesting to
people on this list. We're holding the first user meetup for Spark
(www.spark-project.org), the in-memory cluster computing framework that lets
you do interactive and iterative data mining on Hadoop data, in San Franc
Spark (http://www.spark-project.org) aims to provide a higher-level programming
interface as well as higher performance than Hadoop.
Matei
On Jan 30, 2012, at 2:24 PM, Ronald Petty wrote:
> R.V.,
>
> Are you looking for the platforms that due distributed computation or the
> larger ecosystems
Hi Virajith,
The default FIFO scheduler just isn't optimized for locality for small jobs.
You should be able to get substantially more locality even with 1 replica if
you use the fair scheduler, although the version of the scheduler in 0.20
doesn't contain the locality optimization. Try the Clo
What does the memory load look like on them? The one time I've seen stuff like
this happen regularly is with too much memory in use.
Matei
On Jul 5, 2011, at 9:36 PM, Kai Ju Liu wrote:
> Over the past week or two, I've been seeing an issue where hard-to-reach
> (i.e. hard to ssh to) instances
You can have a new TaskTracker or DataNode join the cluster by just starting
that daemon on the slave (e.g. bin/hadoop-daemon.sh start tasktracker) and
making sure it is configured to connect to the right JobTracker or NameNode
(through the mapred.job.tracker and fs.default.name properties in th
You can implement the configure() method of the Reducer interface and look at
the properties in the JobConf. In particular, "mapred.reduce.tasks" is the
number of reduce tasks and "mapred.job.tracker" will be set to "local" when
running in local mode.
Matei
On Jun 22, 2011, at 3:12 PM, Steve L
Hi Adam,
It looks like map output records are indeed serialized before being combined
and written out. I'm not really sure why this is, except perhaps to simplify
the code for the case where you don't know the size of the records. Maybe
someone more familiar with this part of Hadoop can explain
Hi Michael,
The Fair Scheduler's LoadManager was indeed put in place to allow for
resource-aware scheduling in the future. Actually, Scott Chen from Facebook is
currently working towards this feature. His latest patch related to it is
https://issues.apache.org/jira/browse/MAPREDUCE-1218, which