Re: Control the number of Mappers

2010-11-25 Thread Shrijeet Paliwal
More to your need, (I had missed this earlier), >>The number of cores is not something I know in advance, so writing a special >>InputFormat might be tricky, unless I can query Hadoop for the available # of >>cores You dont have to write a fancy InputFormat. Once you have an (correct) implementa

Re: Control the number of Mappers

2010-11-25 Thread Shai Erera
Thanks, I'll take a look On Thu, Nov 25, 2010 at 10:20 PM, Shrijeet Paliwal wrote: > Shai, > > You will have to implement MultiFileInputFormat > > and > set that has your input format. > You ma

Re: Control the number of Mappers

2010-11-25 Thread Shrijeet Paliwal
Shai, You will have to implement MultiFileInputFormat and set that has your input format. You may find http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/examples/MultiFileWordCoun

Re: Control the number of Mappers

2010-11-25 Thread Niels Basjes
Ah, In that case this should answer your question: http://wiki.apache.org/hadoop/HowManyMapsAndReduces 2010/11/25 Shai Erera : > I wasn't talking about how to configure the cluster to not invoke more than > a certain # of Mappers simultaneously. Instead, I'd like to configure a > (certain) job t

Re: Control the number of Mappers

2010-11-25 Thread Shai Erera
I wasn't talking about how to configure the cluster to not invoke more than a certain # of Mappers simultaneously. Instead, I'd like to configure a (certain) job to invoke exactly N Mappers, where N is the number of cores in the cluster. Irregardless of the size of the data. This is not critical if

Re: Control the number of Mappers

2010-11-25 Thread Niels Basjes
Hi, 2010/11/25 Shai Erera : > Is there a way to make MapReduce create exactly N Mappers? More > specifically, if say my data can be split to 200 Mappers, and I have only > 100 cores, how can I ensure only 100 Mappers will be created? The number of > cores is not something I know in advance, so wri

Control the number of Mappers

2010-11-25 Thread Shai Erera
Hi Is there a way to make MapReduce create exactly N Mappers? More specifically, if say my data can be split to 200 Mappers, and I have only 100 cores, how can I ensure only 100 Mappers will be created? The number of cores is not something I know in advance, so writing a special InputFormat might

Implement Writable which de-serializes to disk

2010-11-25 Thread Shai Erera
Hi I need to implement a Writable, which contains a lot of data, and unfortunately I cannot break it down to smaller pieces. The output of a Mapper is potentially a large record, which can be of any size ranging from few 10s of MBs to few 100s of MBs. Is there a way for me to de-serialize the Wri

Re: Starting a Hadoop job programtically

2010-11-25 Thread li ping
I made it works. The configuration file contains some error. fs.default.name hdfs://xi-pli:9000 mapred.job.tracker xi-pli:9001 Here,we'd better use the IP address instead of the host-name. because the hostname may point to a lookback address. When

Re: Starting a Hadoop job programtically

2010-11-25 Thread li ping
But I can run the application on the hadoop server. (not use the hadoop command to run). On Thu, Nov 25, 2010 at 4:38 PM, Jeff Zhang wrote: > Please check the status of job tracker since you can not find port > 9001 using netstat > > > > 2010/11/25 li ping : > > Hi: > > I am trying to run a jo

Re: Starting a Hadoop job programtically

2010-11-25 Thread jingguo yao
A possible cause is that name node and job tracker on server A are bound to a local address such as 127.0.0.1 which server B can't see. If you are using linux, you can check /etc/hosts to make sure that xi-pli is not bound to a local address. 2010/11/25 li ping > Hi: > > I am trying to run a jo

Re: Starting a Hadoop job programtically

2010-11-25 Thread Harsh J
Are you running a firewall on Server A? Check your firewall settings if so, to allow connections to port 9001/etc.. I'd recommend disabling Firewalls in development environments. 2010/11/25 li ping : > Hi: > I am trying to run a job in my own application. > So far, I can run the job on the server

Re: Starting a Hadoop job programtically

2010-11-25 Thread Jeff Zhang
Please check the status of job tracker since you can not find port 9001 using netstat 2010/11/25 li ping : > Hi: > I am trying to run a job in my own application. > So far, I can run the job on the server which the hadoop server is running > on. > But what I expect is the hadoop server is runnin

Starting a Hadoop job programtically

2010-11-25 Thread li ping
Hi: I am trying to run a job in my own application. So far, I can run the job on the server which the hadoop server is running on. But what I expect is the hadoop server is running on server A. and the application will run on other server (B server). If I run the application on Server B, it will