Get Current Block or Split ID, and using it, the Block Path

2012-04-08 Thread Deepak Nettem
Hi, Is it possible to get the 'id' of the currently executing split or block from within the mapper? Using this block Id / split id, I want to be able to query the namenode to get the names of hosts having that block / spllit, and the actual path to the data. I need this for some analytics that

Re: Get Current Block or Split ID, and using it, the Block Path

2012-04-08 Thread Mohit Anchlia
I think if you called getInputFormat on JobConf and then called getSplits you would atleast get the locations. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/InputSplit.html On Sun, Apr 8, 2012 at 9:16 AM, Deepak Nettem deepaknet...@gmail.comwrote: Hi, Is it

Re: Creating and working with temporary file in a map() function

2012-04-08 Thread Ondřej Klimpera
Thanks for your advise, File.createTempFile() works great, at least in pseudo-ditributed mode, hope cluster solution will do the same work. You saved me hours of trying... On 04/07/2012 11:29 PM, Harsh J wrote: MapReduce sets mapred.child.tmp for all tasks to be the Task Attempt's

Job, JobConf, and Configuration.

2012-04-08 Thread JAX
Hi guys. Just a theoretical question here : I notice in chapter 1 of the Hadoop orielly book that the new API example has *no* Configuration object. Why is that? I thought the new API still uses / needs a Configuration class when running jobs. Jay Vyas MMSB UCHC On Apr 7, 2012, at 4:29

Re: Get Current Block or Split ID, and using it, the Block Path

2012-04-08 Thread Harsh J
Deepak On Sun, Apr 8, 2012 at 9:46 PM, Deepak Nettem deepaknet...@gmail.com wrote: Hi, Is it possible to get the 'id' of the currently executing split or block from within the mapper? Using this block Id / split id, I want to be able to query the namenode to get the names of hosts having

Re: Job, JobConf, and Configuration.

2012-04-08 Thread Harsh J
The Job class encapsulates the Configuration object and manages it for you. You can also get its reference out via Job.getConfiguration() - http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/JobContext.html#getConfiguration() Hence, when you do a: Job job = new Job();,

Re: Creating and working with temporary file in a map() function

2012-04-08 Thread Harsh J
It will work. Pseudo-distributed mode shouldn't be all that different from a fully distributed mode. Do let us know if it does not work as intended. On Sun, Apr 8, 2012 at 11:40 PM, Ondřej Klimpera klimp...@fit.cvut.cz wrote: Thanks for your advise, File.createTempFile() works great, at least in

Re: Creating and working with temporary file in a map() function

2012-04-08 Thread Ondřej Klimpera
I will, but deploying application on a cluster is now far away. Just finishing raw implementation. Cluster tuning is planed in the end of this month. Thanks. On 04/08/2012 09:06 PM, Harsh J wrote: It will work. Pseudo-distributed mode shouldn't be all that different from a fully distributed

Re: Get Current Block or Split ID, and using it, the Block Path

2012-04-08 Thread JAX
I have a related question about blocks related to thisNormally, a reduce job outputs several files, all in the same directory. But why? Since we know that Hadoop is abstracting our file for us, shouldn't the part-r- outputs ultimately be thought of as a single file? What is the

Re: Get Current Block or Split ID, and using it, the Block Path

2012-04-08 Thread Harsh J
Hi, The part in the default filename stands for partition. In some cases I agree you would not mind viewing them as a singular file instead of having to read directories - but there are also use cases where you would want each partition file to be unique cause you partitioned and processed them

How do I include the newer version of Commons-lang in my jar?

2012-04-08 Thread Sky
Hi. I am new to Hadoop and I am working on project on AWS Elastic MapReduce. The problem I am facing is: * org.apache.commons.lang.time.DateUtils: parseDate() works OK but parseDateStrictly() fails. I think parseDateStrictly might be new in lang 2.5. I thought I included all dependencies.