Re: Deduplication Effort in Hadoop

2011-07-14 Thread C.V.Krishnakumar Iyer
Hi, I guess by "system" you meant HDFS. In that case HBase might help. HBase needs to have unique keys. They are just bytes, so I guess you can just concatenate multiple columns in your primary key ( if you have a primary key spanning >1 column) to have a key for HBase, so that duplicates don

Command Line Arguments for Client

2011-02-22 Thread C.V.Krishnakumar Iyer
Hi, Could anyone tell how we could set the commandline arguments ( like -Xmx and -Xms) for the client (not for the map/reduce tasks) from the command that is usually used to launch the job? Thanks, Krishnakumar

Re: libjars options

2011-01-11 Thread C.V.Krishnakumar Iyer
file (can view >> via a JT Web UI)? >> >> -- >> Alex Kozlov >> Solutions Architect >> Cloudera, Inc >> twitter: alexvk2009 >> <http://www.cloudera.com/company/press-center/hadoop-world-nyc/> >> >> >> On Tue, Jan 11, 2011 a

Re: libjars options

2011-01-11 Thread C.V.Krishnakumar Iyer
>> Solutions Architect >> Cloudera, Inc >> twitter: alexvk2009 >> <http://www.cloudera.com/company/press-center/hadoop-world-nyc/> >> >> >> On Tue, Jan 11, 2011 at 11:49 AM, C.V.Krishnakumar Iyer < >> f2004...@gmail.com> wrote: >&

Re: libjars options

2011-01-11 Thread C.V.Krishnakumar Iyer
Hi, I have tried that as well, using -files But it still gives the exact same error. Any other thing that I could try? Thanks, Krishna. On Jan 11, 2011, at 10:23 AM, Ted Yu wrote: > Refer to Alex Kozlov's answer on 12/11/10 > > On Tue, Jan 11, 2011 at 10:10 AM, C.V.Kri

libjars options

2011-01-11 Thread C.V.Krishnakumar Iyer
Hi, Could anyone please guide me as to how to use the -libjars option in HDFS? I have added the necessary jar file (the hbase jar - to be precise) to the classpath of the node where I am starting the job. The following is the format that i am invoking: bin/hadoop jar -libjars bin/hadoo

-libjars option

2011-01-10 Thread C.V.Krishnakumar Iyer
Hi, Could anyone please guide me as to how to use the -libjars option in HDFS? I have added the necessary jar file (the hbase jar - to be precise) to the classpath of the node where I am starting the job. The following is the format that i am invoking: bin/hadoop jar -libjars bin/hadoo

Re: IOException in TaskRunner (Error Code :134)

2010-09-21 Thread C.V.Krishnakumar
datanode, and namenode logs. > > On Sep 21, 2010, at 12:30 PM, C.V.Krishnakumar wrote: > >> >> Hi, >> Just wanted to know if anyone has any idea about this one? This happens >> every time I run a job. >> Is this issue hardware related? >> &g

IOException in TaskRunner (Error Code :134)

2010-09-21 Thread C.V.Krishnakumar
Hi, Just wanted to know if anyone has any idea about this one? This happens every time I run a job. Is this issue hardware related? Thanks in advance, Krishnakumar. Begin forwarded message: > From: "C.V.Krishnakumar" > Date: September 17, 2010 1:32:49 PM PDT &

Tasks Failing : IOException in TaskRunner (Error Code :134)

2010-09-17 Thread C.V.Krishnakumar
Hi all, I am facing a problem with the TaskRunner. I have a small hadoop cluster in the fully distributed mode. However when I submit a job, the job never seems to proceed beyond the "map 0% reduce 0%" stage. Soon after I get this error: java.io.IOException: Task process exit with nonzero stat

Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2010-07-27 Thread C.V.Krishnakumar
Hi Deepak, Maybe I did not make my mail clear. I had tried the instructions in the blog you mentioned. They are working for me. Did you change the /etc/hosts file at any point of time? Regards, Krishna On Jul 27, 2010, at 2:30 PM, C.V.Krishnakumar wrote: > Hi Deepak, > > YOu co

Re: Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

2010-07-27 Thread C.V.Krishnakumar
Hi Deepak, YOu could refer this too : http://markmail.org/message/mjq6gzjhst2inuab#query:MAX_FAILED_UNIQUE_FETCHES+page:1+mid:ubrwgmddmfvoadh2+state:results I tried those instructions and it is working for me. Regards, Krishna On Jul 27, 2010, at 12:31 PM, Deepak Diwakar wrote: > Hey friends

Re: using 'fs -put' from datanode: all data written to that node's hdfs and not distributed

2010-07-13 Thread C.V.Krishnakumar
gt;> NameNode, and then put only on the DataNode where I ran the put command >> >> On Tue, Jul 13, 2010 at 9:32 AM, C.V.Krishnakumar >> wrote: >> >>> Hi, >>> I am a newbie. I am curious to know how you discovered that all the blocks >>>

Re: using 'fs -put' from datanode: all data written to that node's hdfs and not distributed

2010-07-13 Thread C.V.Krishnakumar
Hi, I am a newbie. I am curious to know how you discovered that all the blocks are written to datanode's hdfs? I thought the replication by namenode was transparent. Am I missing something? Thanks, Krishna On Jul 12, 2010, at 4:21 PM, Nathan Grice wrote: > We are trying to load data into hdfs fr