Generally speaking, static fields are not useful in Hadoop.
The issue you are seeing is that the reducer is running in a separate VM
(possibly on a different node!) and thus the static value you are reading
inside of Mid is actually a separate instantiation of that class and field.
If you have
, 2013 at 3:14 PM, Robert Dyer psyb...@gmail.com wrote:
Is there an easy way to get the uncompressed size of a sequence file that
is block compressed? I am using the Snappy compressor.
I realize I can obviously just decompress them to temporary files to get
the size, but I would assume
Is there an easy way to get the uncompressed size of a sequence file that
is block compressed? I am using the Snappy compressor.
I realize I can obviously just decompress them to temporary files to get
the size, but I would assume there is an easier way. Perhaps an existing
tool that my search
Thanks Sandy! These seem helpful!
MapReduce cluster configuration options have been split into YARN
configuration options, which go in yarn-site.xml; and MapReduce
configuration options, which go in mapred-site.xml. Many have been given
new names to reflect the shift. ... *We’ll follow up with a
So does anyone have any ideas how to track this down?
Is it perhaps an exception somewhere in an output committer that is being
swallowed and not showing up in the logs?
On Tue, Oct 22, 2013 at 2:19 AM, Robert Dyer rd...@iastate.edu wrote:
The logs for the maps and reduces show nothing useful
] org.apache.hadoop.mapred.Task: Task
'attempt_1382415258498_0001_m_14_0' done.
On Tue, Oct 22, 2013 at 12:16 AM, Arun C Murthy a...@hortonworks.com wrote:
If you follow the links on the web-ui to the logs of the map/reduce tasks,
what do you see there?
Arun
On Oct 21, 2013, at 9:55 PM, Robert Dyer psyb
I recently setup a 2.2.0 test cluster. For some reason, all of my MR jobs
are failing. The maps and reduces all run to completion, without any
errors. Yet the app is marked failed and there is no final output. Any
ideas?
Application Type: MAPREDUCE
State: FINISHED
FinalStatus: FAILED
I just noticed the job status for MR jobs tends to show 0's in the Map and
Reduce columns but actually shows the totals correctly.
I am not sure exactly when this started happening, but this cluster was
upgraded from Hadoop 1.0.4 to 1.1.2 and now to 1.2.1. It definitely worked
fine on 1.0.4, but
.
But it is not fixed by the current release.
Thanks,
Shinichi
(2013/09/03 11:20), Robert Dyer wrote:
I just noticed the job status for MR jobs tends to show 0's in the Map
and Reduce columns but actually shows the totals correctly.
I am not sure exactly when this started happening
Actually, 1.2.1 is out (and marked stable). I see no reason not to upgrade.
http://hadoop.apache.org/docs/r1.2.1/releasenotes.html
As far as performance goes, when I upgraded our cluster from 1.0.4 to
1.1.2, our small jobs (that took about 1 min each) were taking about 20-30s
less time. So
I recently upgraded from 1.0.4 to 1.1.2. Now however my HDFS won't start
up. There appears to be something wrong in the edits file.
Obviously I can roll back to a previous checkpoint, however it appears
checkpointing has been failing for some time and my last check point is
over a month old.
You can, however note that unless you also run a TaskTracker on that node
(bad idea) then any blocks that are replicated to this node won't be
available as input to MapReduces and you are lowering the odds of having
data locality on those blocks.
On Tue, May 14, 2013 at 2:01 AM, Ramya S
It isn't GPL. OpenJDK[1] is GPLv2 with a Classpath Exception[2] (which is
important).
Read more here:
http://programmers.stackexchange.com/questions/52534/can-we-use-java-for-commercial-use
Also note that Hadoop[3] is licensed under Apache v2[4].
[1] http://openjdk.java.net/legal/
[2]
://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobContext.html#getProgressible%28%29
On Sat, Apr 13, 2013 at 2:35 PM, Robert Dyer psyb...@gmail.com wrote:
What does the job cleanup task do? My understanding was it just cleaned
up any intermediate/temporary files and moved the reducer output
What does the job cleanup task do? My understanding was it just cleaned up
any intermediate/temporary files and moved the reducer output to the output
directory? Does it do more?
One of my jobs runs, all maps and reduces finish, but then the job cleanup
task never finishes. Instead it gets
the short circuit.
Now I see no network utilization for this job and it runs *much* faster (13
mins instead of 2+ hours)! Problem solved! :-)
Thanks Harsh!
On Mon, Feb 25, 2013 at 1:41 AM, Robert Dyer rd...@iastate.edu wrote:
I am using Ganglia.
Note I have short circuit reads enabled (I think
I have a small 6 node dev cluster. I use a 1GB SequenceFile as input to a
MapReduce job, using a custom split size of 10MB (to increase the number of
maps). Each map call will read random entries out of a shared MapFile
(that is around 50GB).
I set replication to 6 on both of these files, so
over a local socket as
well, and may appear in network traffic observing tools too (but do
not mean they are over the network).
On Mon, Feb 25, 2013 at 2:35 AM, Robert Dyer psyb...@gmail.com wrote:
I have a small 6 node dev cluster. I use a 1GB SequenceFile as input to
a
MapReduce job, using
? Is there an easy
way to monitor (other than a script grep'ing the logs) the checkpoints to
see when this happens?
On Sat, Feb 16, 2013 at 2:39 PM, Robert Dyer psyb...@gmail.com wrote:
Forgot to mention: Hadoop 1.0.4
On Sat, Feb 16, 2013 at 2:38 PM, Robert Dyer psyb...@gmail.com wrote:
I am at a bit
https://mtariq.jux.com/
cloudfront.blogspot.com
On Mon, Feb 18, 2013 at 3:31 AM, Robert Dyer psyb...@gmail.com wrote:
It just happened again. This was after a fresh format of HDFS/HBase and
I am attempting to re-import the (backed up) data.
http://pastebin.com/3fsWCNQY
So now if I
Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Mon, Feb 18, 2013 at 3:31 AM, Robert Dyer psyb...@gmail.com wrote:
It just happened again. This was after a fresh format of HDFS/HBase and
I am attempting to re-import the (backed up) data.
http://pastebin.com/3fsWCNQY
or
a regular SIGTERM shutdown?
I shut down the NN with 'bin/stop-dfs.sh'.
On Mon, Feb 18, 2013 at 4:31 AM, Robert Dyer rd...@iastate.edu wrote:
On Sun, Feb 17, 2013 at 4:41 PM, Mohammad Tariq donta...@gmail.com
wrote:
You can make use of offine image viewer to diagnose
the fsimage file
Forgot to mention: Hadoop 1.0.4
On Sat, Feb 16, 2013 at 2:38 PM, Robert Dyer psyb...@gmail.com wrote:
I am at a bit of wits end here. Every single time I restart the namenode,
I get this crash:
2013-02-16 14:32:42,616 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size
http://jayunit100.blogspot.com
--
Robert Dyer
rd...@iastate.edu
You could create a CustomOutputCommitter and in the commitJob() method
simply read in the part-* files and write them out into a single aggregated
file.
This requires making a CustomOutputFormat class that uses the
CustomOutputCommittter and then setting that
via
, 2012 at 4:09 PM, Robert Dyer rd...@iastate.edu wrote:
Has anyone experienced a TaskTracker/DataNode behaving like the attached
image?
This was during a MR job (which runs often). Note the extremely high
System
CPU time. Upon investigating I saw that out of 64GB ram the system had
parameter that controls the minimum size of the free
chain, might want to increase that a bit.
Also, look into hosting your JVM heap on huge pages, they can't be paged
out and will help the JVM perform better too.
On Dec 8, 2012, at 6:09 PM, Robert Dyer rd...@iastate.edu wrote:
Has anyone
job again.
Can you share your logs in pastebin?
On Sat 08 Dec 2012 07:09:02 PM CST, Robert Dyer wrote:
Has anyone experienced a TaskTracker/DataNode behaving like the
attached image?
This was during a MR job (which runs often). Note the extremely high
System CPU time. Upon investigating I
Hi Manoj,
If the data is the same for both tests and the number of mappers is
fewer, then each mapper has more (uncompressed) data to process. Thus
each mapper should take longer and overall execution time should
increase.
As a simple example: if your data is 128MB uncompressed it may use 2
If the file is pre-sorted, why not just make multiple sequence files -
1 for each split?
Then you don't have to compute InputSplits because the physical files
are already split.
On Tue, Sep 11, 2012 at 11:00 PM, Harsh J ha...@cloudera.com wrote:
Hey Jason,
Is the file pre-sorted? You could
I have been reading up on HBase and my understanding is that the
physical files on the HDFS are split first by region and then by
column families.
Thus each column family has its own physical file (on a per-region basis).
If I run a MapReduce task that uses the HBase as input, wouldn't this
I am currently using a SequenceFile as input to my MR job (on Hadoop
1.0.3). This works great, as my input is just a bunch of binary
blobs.
However it seems SequenceFile is only intended to append new data and
never update existing entries. Is that correct?
If so, would i be better off moving
ID,
and send the link across? They shouldn't hang the way you describe.
On Fri, Jul 13, 2012 at 9:33 AM, Robert Dyer psyb...@gmail.com wrote:
I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2
compute nodes). My input size is a sequence file of around 280mb
I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2
compute nodes). My input size is a sequence file of around 280mb.
Generally, my jobs run just fine and all finish in 2-5 minutes. However,
quite randomly the jobs refuse to run. They submit and appear when running
'hadoop
34 matches
Mail list logo