Re: How to acquire Input Filename

2012-12-06 Thread Olivier Varene - echo
Have you tried with String fileName = ((org.apache.hadoop.mapreduce.lib.input.FileSplit) context.getInputSplit()).getPath().getName(); ? hope it helps Olivier Le 6 déc. 2012 à 00:24, Hans Uhlig a écrit : I am currently using multiple inputs to merge quite a few different but related

Re: Java Heap memory error : Limit to 2 Gb of ShuffleRamManager ?

2012-12-06 Thread Olivier Varene - echo
Yes I will thanks for the answer regards Olivier Le 6 déc. 2012 à 19:41, Arun C Murthy a écrit : Oliver, Sorry, missed this. The historical reason, if I remember right, is that we used to have a single byte buffer and hence the limit. We should definitely remove it now since we

Re: Non local mapper .. Is it worth it?

2012-12-06 Thread Bertrand Dechoux
The short answer is yes it can be worth it because your job can finish faster if you are not only allowing local mappers. But this is of course a trade off. The best performance (but not latency) can be obtained when using only local mappers. You should read about delay scheduling which allows the

Re: Non local mapper .. Is it worth it?

2012-12-06 Thread Jay Vyas
H but How can the scheduler effect the performance of a Mapper if there are no competing jobs? I thought the scheduler only impacted the way separate jobs got resources for different jobs. In my example, there are 2 mappers, 2+n files, and 1 job. Jay Vyas

Re: Map Reduce jobs taking a long time at the end

2012-12-06 Thread Jay Whittaker
Yeah, it's against a ~95million row table in hbase. It takes about 30 mins to get to 90% then about 3+ hours to get from 90% to 100% On Wed, 2012-12-05 at 08:46 -0800, in.abdul wrote: Hi jay.. Are you trying to do M-R on HBase Table ? Thanks and regards Syed Abdul Kather

Re: Map tasks processing some files multiple times

2012-12-06 Thread Hemanth Yamijala
David, You are using FileNameTextInputFormat. This is not in Hadoop source, as far as I can see. Can you please confirm where this is being used from ? It seems like the isSplittable method of this input format may need checking. Another thing, given you are adding the same input format for all

Re: M/R, Strange behavior with multiple Gzip files

2012-12-06 Thread Jean-Marc Spaggiari
Hi, Have you configured the mapredsite.xml to tell where the job tracker is? If not, your job is running on the local jobtracker, running the tasks one by one. JM PS: I faced the same issue few weeks ago and got the exact same behaviour. This (above) solved the issue. 2012/12/6, x6i4uybz labs

Re: M/R, Strange behavior with multiple Gzip files

2012-12-06 Thread x6i4uybz labs
Hello, The job isn't running in local mode. In fact, I think I have just a problem with the map task progression. The counters of each map task are OK during the job execution whereas the progression of each map task stays at 0%. On Thu, Dec 6, 2012 at 1:34 PM, Jean-Marc Spaggiari

Re: Map tasks processing some files multiple times

2012-12-06 Thread Hemanth Yamijala
Glad it helps. Could you also explain the reason for using MultipleInputs ? On Thu, Dec 6, 2012 at 2:59 PM, David Parks davidpark...@yahoo.com wrote: Figured it out, it is, as usual, with my code. I had wrapped TextInputFormat to replace the LongWritable key with a key representing the file

Re: M/R, Strange behavior with multiple Gzip files

2012-12-06 Thread Harsh J
I tend to agree with Jean-Marc's observation. If your job client logs a LocalJobRunner at any point, then that is most definitely your problem. Otherwise, if you feel you are facing a scheduling problem, then it may most likely be your scheduler configuration. For example, FairScheduler has a

Query about Speculative Execution

2012-12-06 Thread Ajay Srivastava
Hi, What is the behavior of jobTracker if speculative execution is off and a task on data node is running extremely slow? Will the jobTracker simply wait till the slow running task finishes or it will try to heal the situation? Assuming that heartbeat from the node running slow task are

Re: Query about Speculative Execution

2012-12-06 Thread Harsh J
Given that Speculative Execution *is* the answer to such scenarios, I'd say the answer to your question without it, is *nothing*. If a task does not report status for over 10 minutes (default), it is killed and retried. If it does report status changes (such as counters, task status, etc.) but is

Re: M/R, Strange behavior with multiple Gzip files

2012-12-06 Thread x6i4uybz labs
Thanks for your answers. I haven't yet the whole solution but I know : - the job is not running on a local TT - the map process is very slow - and the progress bar is not working proprely So, the map tasks are running in parallel (hadoop works :)) but I don't understand why the progression

Re: M/R, Strange behavior with multiple Gzip files

2012-12-06 Thread Harsh J
Ok, I can't tell about the performance of your map process, but it is sometimes common to see 0% - 100% jumps in progressbars when working over compressed data - as the progress (in terms of data records processed overall) can't be perfectly determined. It might even be a bug recently fixed. If

Re: Hadoop V/S Cassandra

2012-12-06 Thread Harsh J
Hi Yogesh, Just wanted to correct one point of yours: On Thu, Dec 6, 2012 at 10:25 PM, yogesh dhari yogeshdh...@live.com wrote: Hadoop have single point of Failure but Cassandra doesn't.. In case you aren't aware yet, Hadoop (HDFS) has no single point of failure anymore. The HDFS project

Re: Problem using distributed cache

2012-12-06 Thread Harsh J
What is your conf object there? Is it job.getConfiguration() or an independent instance? On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan peter.co...@gmail.com wrote: Hi , I want to use the distributed cache to allow my mappers to access data. In main, I'm using the command

RE: Hadoop V/S Cassandra

2012-12-06 Thread yogesh dhari
Thanks Harsh :-) Thanks for your reply... Please do put some light on other points. Thanks Regards Yogesh Kumar From: ha...@cloudera.com Date: Thu, 6 Dec 2012 22:30:46 +0530 Subject: Re: Hadoop V/S Cassandra To: user@hadoop.apache.org Hi Yogesh, Just wanted to correct one point

Re: Hadoop V/S Cassandra

2012-12-06 Thread anil gupta
Hi Yogesh, As others have said Hadoop vs Cassandra is not a fair comparison. Although, HBase vs Cassandra is a fair comparison. You can have a look at this comparison: http://bigdatanoob.blogspot.com/2012/11/hbase-vs-cassandra.html HTH, Anil Gupta On Thu, Dec 6, 2012 at 11:27 AM, Colin McCabe

RE: Hadoop V/S Cassandra

2012-12-06 Thread yogesh dhari
Thanks a lot guys :-) Regards Yogesh Kumar From: anilgupt...@gmail.com Date: Thu, 6 Dec 2012 11:31:16 -0800 Subject: Re: Hadoop V/S Cassandra To: user@hadoop.apache.org Hi Yogesh, As others have said Hadoop vs Cassandra is not a fair comparison. Although, HBase vs Cassandra is a fair

Re: DFS and the RecordReader

2012-12-06 Thread Jay Vyas
Hmm... so when a record reader calls fs.open(...) , I guess Im looking for an example of how the input stream is created... ?

Re: DFS and the RecordReader

2012-12-06 Thread Harsh J
Ah ok, understood what you seem to be looking for. Lets follow the simple LineReader implementation in that case. TextInputFormat uses LineRecordReader: [1] - Line 52 LineRecordReader has the calls you look for and wraps over a LineReader implementation, to take care of reading lines over block

Re: Query about Speculative Execution

2012-12-06 Thread Mahesh Balija
To simply, if you turn-off the speculative execution then the system will never bother about slow running jobs unless they won't report beyond specified time (10 minutes). If you have set speculative execution to true then the system may spawn another instance of mapper and consider the output of

Re: Query about Speculative Execution

2012-12-06 Thread Ajay Srivastava
Thanks Mahesh Harsh. On 07-Dec-2012, at 7:42 AM, Mahesh Balija wrote: To simply, if you turn-off the speculative execution then the system will never bother about slow running jobs unless they won't report beyond specified time (10 minutes). If you have set speculative execution to true

Re: Reg: No space left on device Exception

2012-12-06 Thread Manoj Babu
Is it there is not enough space to keep the intermediate files? How to find space allocated for HDFS and normal FS for a particular node but overall the cluster is having more free space. Cheers! Manoj. On Fri, Dec 7, 2012 at 11:22 AM, Marcos Ortiz mlor...@uci.cu wrote: It seems that you

Re: Issue with third party library

2012-12-06 Thread Sampath Herga
Hi Hemanth, Setting the full path worked. Thanks, Sampath. On Thu, Dec 6, 2012 at 9:51 AM, Hemanth Yamijala yhema...@thoughtworks.comwrote: Sampath, You mentioned that the file is present in the tasktracker local dir, could you please tell us the full path ? I am wondering if setting