Re: Job stuck in attempt loop on LocalJobRunner, produces no errors

2012-10-22 Thread Bai Shen
ck how many maps your job is resulting > into. Multiple attempts have IDs like attempt_local_0001_m_04_1. > > Thanks, > +Vinod > > > On Oct 22, 2012, at 7:05 AM, Bai Shen wrote: > > > attempt_local_0001_m_04_0 > >

Re: Referencing files in job file from code

2012-10-11 Thread Bai Shen
t. On Thu, Oct 11, 2012 at 10:26 AM, Harsh J wrote: > Hi Bai, > > What exactly do you mean by a 'job file' and have you considered using > DistributedCache, as detailed at > http://hadoop.apache.org/docs/stable/mapred_tutorial.html#DistributedCache > ? > > On T

Re: copyFromLocal: File does not exist.

2012-10-09 Thread Bai Shen
Hortonworks Inc. > http://hortonworks.com/ > > On Oct 9, 2012, at 11:40 AM, Bai Shen wrote: > > > I have a CDH3 cluster up and running. I'm on the namenode and trying to > > copy a file into HDFS. However, whenever I run copyFromLocal, I get a > file > > does not

copyFromLocal: File does not exist.

2012-10-09 Thread Bai Shen
I have a CDH3 cluster up and running. I'm on the namenode and trying to copy a file into HDFS. However, whenever I run copyFromLocal, I get a file does not exist error. [root@node1-0 ~]# sudo -u hdfs hadoop fs -copyFromLocal /root/url.txt / copyFromLocal: File /root/url.txt does not exist. What

org.apache.hadoop.mapred.Merger merge bug

2012-01-17 Thread Bai Shen
I think I've found a bug in the Merger code for Hadoop. When the Map job runs, it creates spill files based on io.sort.mb. It then sorts io.sort.factor files at a time in order to create an output file that's passed to the reduce job. The higher these two settings are configured, the more memory

Task location determination

2012-01-04 Thread Bai Shen
I have a test Hadoop cluster set up using Cloudera. It consists of the Name Node and three Data Nodes. When I submit jobs, they end up piling up on one node instead of round robining through the different nodes. I understand that Hadoop tries to run the job where the data is located, but with on

Cloudera Free

2011-12-08 Thread Bai Shen
Does anyone know of a good tutorial for Cloudera Free? I found installation instructions, but there doesn't seem to be in formation on how to run jobs, etc, once you have it set up. Thanks.

Hadoop Profiling

2011-12-05 Thread Bai Shen
I turned on the profiling in Hadoop, and the MapReduceTutorial at http://hadoop.apache.org/common/docs/current/mapred_tutorial.html says that the profile files should go to the user log directory. However, they're currently going to the working directory where I start the hadoop job from. I've se

Re: Could not find taskTracker/jobcache

2011-10-28 Thread Bai Shen
NM. Looks like it's an issue with running multiple jobs at once. http://mail-archives.apache.org/mod_mbox/nutch-user/201009.mbox/%3c431de8d6-c9b9-47c8-9301-bd0ab0040...@gmail.com%3E On Fri, Oct 28, 2011 at 11:18 AM, Bai Shen wrote: > I'm using hadoop through nutch 1.4 I'

Could not find taskTracker/jobcache

2011-10-28 Thread Bai Shen
I'm using hadoop through nutch 1.4 I'm using the native libraries, but this happens without them as well. I'm only using one local node. Sometimes when running a nutch fetch job, I get the following error. There doesn't seem to be any rhyme or reason to it AFAIK. org.apache.hadoop.util.DiskChe