Hi All,
Does anyone know of tweaking in map-reduce joins that will optimize it
further in terms of the moving only those tuples to reduce phase that join
in the two tables? There are replicated joins and semi-join strategies but
they are more of databases than map-reduce.
Thanks,
Richa Khandelwal
Hrmmm. I can tell init/execution at the job level, but I don't know how to
figure that out at the individual map task level. What would be the best way
for me to determine that?
-Sean
On Wed, Mar 4, 2009 at 12:13 PM, Runping Qi runping...@gmail.com wrote:
Do you know the break down of times
The task (job) tracker log should show when a task was scheduled.
The log for individual task should show when it finished initialization.
On Wed, Mar 4, 2009 at 12:29 PM, Sean Laurent organicveg...@gmail.comwrote:
Hrmmm. I can tell init/execution at the job level, but I don't know how to
I'm using Hadoop 0.19.0 with S3 Native. Up until a few days ago I was
successfully able to use the various shell functions successfully; e.g.,
hadoop dfs -ls .
To ensure access to my Amazon S3 Native data store I set the following
environment variables: AMAZON_ACCESS_KEY_ID and
Hello all,
For the sake of benchmarking, I ran the standard hadoop wordcount example on
an input file using 2, 4, and 8 mappers and reducers for my job.
In other words, I do:
time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 2 -r 2
sample.txt output
time -p bin/hadoop jar
Hi,
I'm trying out an example of importing data from mysql into hadoop
(something like
http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/DBCountPageView.java).
I'm connecting to Mysql and hence am providing the mysql-jdbc-connector jar
via the '-libjar'
Hi,
I'm trying out an example of importing data from mysql into hadoop
(something like
http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/DBCountPageView.java
). I'm connecting to Mysql and hence am providing the mysql-jdbc-connector
jar via the '-libjar'
Put it into the $HADOOP_HOME/lib folder. To be on the safer side, I
generally include it in the job jar.
Dont forget to put Class.forName(driverClassName); in your job code.
Amandeep
Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz
On Wed, Mar 4, 2009
On Wed, 2009-03-04 at 23:14 +0100, Lukáš Vlček wrote:
Sorry for off topic question
It is very off topic.
Any ideas, best practices, book recomendations, papers, tech talk links ...
I found this a nice little book:
http://developer.yahoo.net/blog/archives/2008/11/allspaw_capacityplanning.html
Hi Tim,
Thanks for links.
I know this may sound off topic. On the other hand if you look for example
at the eBay architecture (http://highscalability.com/ebay-architecture) then
you can see that some concepts are close to Hadoop like system (I mean when
you want to build somethink like eBay then
i think this maybe not relatived to whether you are using psuedo-distributed
mode or truely distributed mode.
the speed not only relatived to the number of mapper and reducer count but
also relatived to the problem size and problem type.
A simple example is the word count ,assume we only have 1
11 matches
Mail list logo