Repartitioned Joins

2009-03-04 Thread Richa Khandelwal
Hi All, Does anyone know of tweaking in map-reduce joins that will optimize it further in terms of the moving only those tuples to reduce phase that join in the two tables? There are replicated joins and semi-join strategies but they are more of databases than map-reduce. Thanks, Richa Khandelwal

Re: Jobs run slower and slower

2009-03-04 Thread Sean Laurent
Hrmmm. I can tell init/execution at the job level, but I don't know how to figure that out at the individual map task level. What would be the best way for me to determine that? -Sean On Wed, Mar 4, 2009 at 12:13 PM, Runping Qi runping...@gmail.com wrote: Do you know the break down of times

Re: Jobs run slower and slower

2009-03-04 Thread Runping Qi
The task (job) tracker log should show when a task was scheduled. The log for individual task should show when it finished initialization. On Wed, Mar 4, 2009 at 12:29 PM, Sean Laurent organicveg...@gmail.comwrote: Hrmmm. I can tell init/execution at the job level, but I don't know how to

Hadoop FS shell no longer working with S3 Native

2009-03-04 Thread S D
I'm using Hadoop 0.19.0 with S3 Native. Up until a few days ago I was successfully able to use the various shell functions successfully; e.g., hadoop dfs -ls . To ensure access to my Amazon S3 Native data store I set the following environment variables: AMAZON_ACCESS_KEY_ID and

wordcount getting slower with more mappers and reducers?

2009-03-04 Thread Sandy
Hello all, For the sake of benchmarking, I ran the standard hadoop wordcount example on an input file using 2, 4, and 8 mappers and reducers for my job. In other words, I do: time -p bin/hadoop jar hadoop-0.18.3-examples.jar wordcount -m 2 -r 2 sample.txt output time -p bin/hadoop jar

Importing data from mysql into hadoop

2009-03-04 Thread anand
Hi, I'm trying out an example of importing data from mysql into hadoop (something like http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/DBCountPageView.java). I'm connecting to Mysql and hence am providing the mysql-jdbc-connector jar via the '-libjar'

Importing data from mysql into hadoop

2009-03-04 Thread anand
Hi, I'm trying out an example of importing data from mysql into hadoop (something like http://svn.apache.org/repos/asf/hadoop/core/trunk/src/examples/org/apache/hadoop/examples/DBCountPageView.java ). I'm connecting to Mysql and hence am providing the mysql-jdbc-connector jar via the '-libjar'

Re: Importing data from mysql into hadoop

2009-03-04 Thread Amandeep Khurana
Put it into the $HADOOP_HOME/lib folder. To be on the safer side, I generally include it in the job jar. Dont forget to put Class.forName(driverClassName); in your job code. Amandeep Amandeep Khurana Computer Science Graduate Student University of California, Santa Cruz On Wed, Mar 4, 2009

Re: Off topic: web framework for high traffic service

2009-03-04 Thread Tim Wintle
On Wed, 2009-03-04 at 23:14 +0100, Lukáš Vlček wrote: Sorry for off topic question It is very off topic. Any ideas, best practices, book recomendations, papers, tech talk links ... I found this a nice little book: http://developer.yahoo.net/blog/archives/2008/11/allspaw_capacityplanning.html

Re: Off topic: web framework for high traffic service

2009-03-04 Thread Lukáš Vlček
Hi Tim, Thanks for links. I know this may sound off topic. On the other hand if you look for example at the eBay architecture (http://highscalability.com/ebay-architecture) then you can see that some concepts are close to Hadoop like system (I mean when you want to build somethink like eBay then

Re: wordcount getting slower with more mappers and reducers?

2009-03-04 Thread Nick Cen
i think this maybe not relatived to whether you are using psuedo-distributed mode or truely distributed mode. the speed not only relatived to the number of mapper and reducer count but also relatived to the problem size and problem type. A simple example is the word count ,assume we only have 1