Have you tried with
String fileName = ((org.apache.hadoop.mapreduce.lib.input.FileSplit)
context.getInputSplit()).getPath().getName();
?
hope it helps
Olivier
Le 6 déc. 2012 à 00:24, Hans Uhlig a écrit :
I am currently using multiple inputs to merge quite a few different but
related
Yes I will
thanks for the answer
regards
Olivier
Le 6 déc. 2012 à 19:41, Arun C Murthy a écrit :
Oliver,
Sorry, missed this.
The historical reason, if I remember right, is that we used to have a single
byte buffer and hence the limit.
We should definitely remove it now since we
The short answer is yes it can be worth it because your job can finish
faster if you are not only allowing local mappers. But this is of course a
trade off. The best performance (but not latency) can be obtained when
using only local mappers. You should read about delay scheduling which
allows the
H but How can the scheduler effect the performance of a Mapper if there
are no competing jobs?
I thought the scheduler only impacted the way separate jobs got resources for
different jobs. In my example, there are 2 mappers, 2+n files, and 1 job.
Jay Vyas
Yeah, it's against a ~95million row table in hbase.
It takes about 30 mins to get to 90% then about 3+ hours to get from 90%
to 100%
On Wed, 2012-12-05 at 08:46 -0800, in.abdul wrote:
Hi jay..
Are you trying to do M-R on HBase Table ?
Thanks and regards
Syed Abdul Kather
David,
You are using FileNameTextInputFormat. This is not in Hadoop source, as far
as I can see. Can you please confirm where this is being used from ? It
seems like the isSplittable method of this input format may need checking.
Another thing, given you are adding the same input format for all
Hi,
Have you configured the mapredsite.xml to tell where the job tracker
is? If not, your job is running on the local jobtracker, running the
tasks one by one.
JM
PS: I faced the same issue few weeks ago and got the exact same
behaviour. This (above) solved the issue.
2012/12/6, x6i4uybz labs
Hello,
The job isn't running in local mode. In fact, I think I have just a problem
with the map task progression.
The counters of each map task are OK during the job execution whereas the
progression of each map task stays at 0%.
On Thu, Dec 6, 2012 at 1:34 PM, Jean-Marc Spaggiari
Glad it helps. Could you also explain the reason for using MultipleInputs ?
On Thu, Dec 6, 2012 at 2:59 PM, David Parks davidpark...@yahoo.com wrote:
Figured it out, it is, as usual, with my code. I had wrapped
TextInputFormat to replace the LongWritable key with a key representing the
file
I tend to agree with Jean-Marc's observation. If your job client logs
a LocalJobRunner at any point, then that is most definitely your
problem.
Otherwise, if you feel you are facing a scheduling problem, then it
may most likely be your scheduler configuration. For example,
FairScheduler has a
Hi,
What is the behavior of jobTracker if speculative execution is off and a task
on data node is running extremely slow?
Will the jobTracker simply wait till the slow running task finishes or it will
try to heal the situation? Assuming that heartbeat from the node running slow
task are
Given that Speculative Execution *is* the answer to such scenarios,
I'd say the answer to your question without it, is *nothing*.
If a task does not report status for over 10 minutes (default), it is
killed and retried. If it does report status changes (such as
counters, task status, etc.) but is
Thanks for your answers.
I haven't yet the whole solution but I know :
- the job is not running on a local TT
- the map process is very slow
- and the progress bar is not working proprely
So, the map tasks are running in parallel (hadoop works :)) but I don't
understand why the progression
Ok, I can't tell about the performance of your map process, but it is
sometimes common to see 0% - 100% jumps in progressbars when working
over compressed data - as the progress (in terms of data records
processed overall) can't be perfectly determined. It might even be a
bug recently fixed.
If
Hi Yogesh,
Just wanted to correct one point of yours:
On Thu, Dec 6, 2012 at 10:25 PM, yogesh dhari yogeshdh...@live.com wrote:
Hadoop have single point of Failure but Cassandra doesn't..
In case you aren't aware yet, Hadoop (HDFS) has no single point of
failure anymore. The HDFS project
What is your conf object there? Is it job.getConfiguration() or an
independent instance?
On Thu, Dec 6, 2012 at 10:29 PM, Peter Cogan peter.co...@gmail.com wrote:
Hi ,
I want to use the distributed cache to allow my mappers to access data. In
main, I'm using the command
Thanks Harsh :-)
Thanks for your reply... Please do put some light on other points.
Thanks Regards
Yogesh Kumar
From: ha...@cloudera.com
Date: Thu, 6 Dec 2012 22:30:46 +0530
Subject: Re: Hadoop V/S Cassandra
To: user@hadoop.apache.org
Hi Yogesh,
Just wanted to correct one point
Hi Yogesh,
As others have said Hadoop vs Cassandra is not a fair comparison. Although,
HBase vs Cassandra is a fair comparison. You can have a look at this
comparison: http://bigdatanoob.blogspot.com/2012/11/hbase-vs-cassandra.html
HTH,
Anil Gupta
On Thu, Dec 6, 2012 at 11:27 AM, Colin McCabe
Thanks a lot guys :-)
Regards
Yogesh Kumar
From: anilgupt...@gmail.com
Date: Thu, 6 Dec 2012 11:31:16 -0800
Subject: Re: Hadoop V/S Cassandra
To: user@hadoop.apache.org
Hi Yogesh,
As others have said Hadoop vs Cassandra is not a fair comparison. Although,
HBase vs Cassandra is a fair
Hmm... so when a record reader calls fs.open(...) , I guess Im looking for
an example of how the input stream is created... ?
Ah ok, understood what you seem to be looking for.
Lets follow the simple LineReader implementation in that case.
TextInputFormat uses LineRecordReader: [1] - Line 52
LineRecordReader has the calls you look for and wraps over a
LineReader implementation, to take care of reading lines over block
To simply, if you turn-off the speculative execution then the system will
never bother about slow running jobs unless they won't report beyond
specified time (10 minutes).
If you have set speculative execution to true then the system may spawn
another instance of mapper and consider the output of
Thanks Mahesh Harsh.
On 07-Dec-2012, at 7:42 AM, Mahesh Balija wrote:
To simply, if you turn-off the speculative execution then the system will never
bother about slow running jobs unless they won't report beyond specified time
(10 minutes).
If you have set speculative execution to true
Is it there is not enough space to keep the intermediate files?
How to find space allocated for HDFS and normal FS for a particular node
but overall the cluster is having more free space.
Cheers!
Manoj.
On Fri, Dec 7, 2012 at 11:22 AM, Marcos Ortiz mlor...@uci.cu wrote:
It seems that you
Hi Hemanth,
Setting the full path worked.
Thanks,
Sampath.
On Thu, Dec 6, 2012 at 9:51 AM, Hemanth Yamijala
yhema...@thoughtworks.comwrote:
Sampath,
You mentioned that the file is present in the tasktracker local dir,
could you please tell us the full path ? I am wondering if setting
25 matches
Mail list logo