Re: JobTracker url shwoing less no of nodes available

2012-01-24 Thread alo alt
+common user BCC please post to the correct mailing lists. Added common users. that mean that some DN daemons not running. FIrst place for that are the logs of the DNs. What says that? - Alex -- Alexander Lorenz http://mapredit.blogspot.com On Jan 24, 2012, at 7:55 AM, hadoop hive wrote: >

Hadoop Terasort Error- "File _partition.lst does not exist"

2012-01-24 Thread Utkarsh Rathore
I have a Hadoop cluster on which I have generated some data using Teragen. But while running Terasort on this data, it gives following error. java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org

Hadoop Terasort Error- "File _partition.lst does not exist"

2012-01-24 Thread Utkarsh Rathore
I have a Hadoop cluster on which I have generated some data using Teragen. But while running Terasort on this data, it gives following error. java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util. ReflectionUtils.setJobConf(ReflectionUtils.java:93) at or

Parallel CSV loader

2012-01-24 Thread Edmon Begoli
I am looking to use Hadoop for parallel loading of CSV file into a non-Hadoop, parallel database. Is there an existing utility that allows one to pick entries, row-by-row, synchronized and in parallel and load into a database? Thank you in advance, Edmon

Re: Parallel CSV loader

2012-01-24 Thread Prashant Kommireddi
I am assuming you want to move data between Hadoop and database. Please take a look at Sqoop. Thanks, Prashant Sent from my iPhone On Jan 24, 2012, at 9:19 AM, Edmon Begoli wrote: > I am looking to use Hadoop for parallel loading of CSV file into a > non-Hadoop, parallel database. > > Is there

Re: Parallel CSV loader

2012-01-24 Thread Harsh J
Agree. Apache Sqoop is what you're looking for: http://incubator.apache.org/sqoop/ On Tue, Jan 24, 2012 at 10:51 PM, Prashant Kommireddi wrote: > I am assuming you want to move data between Hadoop and database. > Please take a look at Sqoop. > > Thanks, > Prashant > > Sent from my iPhone > > On J

Re: When to use a combiner?

2012-01-24 Thread Sameer Farooqui
Hi Steve, Yeah, you're right in your suspicions that a combiner may not be useful in your use case. It's mainly used to reduce network traffic between the mappers and the reducers. Hadoop may apply the combiner zero, one or multiple times to the intermediate output from the mapper, so it's hard

Re: When to use a combiner?

2012-01-24 Thread Raj V
Just to add to Sameer's response - you cannot use a combiner in case you are finding the average  temperature. The combiner running on each mapper will produce the average for that mapper's output and the reducer will find the average of the combiner outputs, which in this case will be the average

hadoop dfs -ls hdfs://Atlas.sollix.com:8020/ fails

2012-01-24 Thread Zaki Syed
Hi Everybody I installed the hadoop on a 5 node cluster using the Cloudera's SCM. The following is the setup. Atlas.prodigy.com -- Namenode and Data Node Pan.prodigy.com -- Secondary Namenode and Data Node Prometheus.prodigy.com-- Data Node Ulysys.prodigy.com--Data Node Blade.prodigy.com -- Data

Re: hadoop dfs -ls hdfs://Atlas.sollix.com:8020/ fails

2012-01-24 Thread Raj V
Do you have firewall enabled on any of these systems? Raj > > From: Zaki Syed >To: common-user@hadoop.apache.org >Sent: Tuesday, January 24, 2012 3:34 PM >Subject: hadoop dfs -ls hdfs://Atlas.sollix.com:8020/ fails > >Hi Everybody > >I installed the hadoop on a

Sqoop and Teradata

2012-01-24 Thread Srinivas Surasani
Hi All, I'm working on Hadoop CDH3 U0 and Sqoop CDH3 U2. I'm trying to export csv files from HDFS to Teradata, it works well with setting mapper to "1" ( with batch loading of 1000 records at a time ). when I tried increasing the number of mappers to more than one I'm getting the following error

Re: Parallel CSV loader

2012-01-24 Thread Srinivas Surasani
Edmen, Parallel Databases ( Teradata, Netezza..)?? I believe if you use Sqoop (with JBDC) for loading you cannot achieve parallelism since table gets dead locks by specifying more mappers. But you can use Sqoop + Parallel Database Connector ( you find them on Cloudera site ) to achieve the native

Hadoop Terasort Error- "File _partition.lst does not exist"

2012-01-24 Thread rathore87
Folks, I have a Hadoop cluster on which I have generated some data using Teragen. But while running Terasort on this data, it gives following error. java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util. ReflectionUtils.setJobConf(ReflectionUtils.java:93)

Re: Hadoop Terasort Error- "File _partition.lst does not exist"

2012-01-24 Thread Harsh J
Apparently, you are running terasort with a local job runner as explained by the presence of "org.apache.hadoop.fs.RawLocalFileSystem" and "LocalJobRunner" in your provided log message. Ensure mapred.job.tracker is properly set in your mapred-site.xml, for your job to reach the MapReduce cluster.

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2012-01-24 Thread srinivas
Harsh J writes: > > Hello RX, > > Could you paste your DFS configuration and the DN end-to-end log into > a mail/pastebin-link? > > On Fri, May 27, 2011 at 5:31 AM, Xu, Richard wrote: > > Hi Folks, > > > > We try to get hbase and hadoop running on clusters, take 2 Solaris servers for now.

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2012-01-24 Thread Harsh J
Hey Srinivas, Best to start your own new thread for questions instead of digging up an older one, but it seems like you already have JobTracker running, or something else bound to the port 5102 on your machine. I'd just check if JobTracker is already running, that might mostly be it. On Wed, Jan