Re: Get dynamic values in a user defined class from reducer.

2013-12-18 Thread Robert Dyer
Generally speaking, static fields are not useful in Hadoop. The issue you are seeing is that the reducer is running in a separate VM (possibly on a different node!) and thus the static value you are reading inside of Mid is actually a separate instantiation of that class and field. If you have

Re: Uncompressed size of Sequence files

2013-11-27 Thread Robert Dyer
, 2013 at 3:14 PM, Robert Dyer psyb...@gmail.com wrote: Is there an easy way to get the uncompressed size of a sequence file that is block compressed? I am using the Snappy compressor. I realize I can obviously just decompress them to temporary files to get the size, but I would assume

Uncompressed size of Sequence files

2013-11-23 Thread Robert Dyer
Is there an easy way to get the uncompressed size of a sequence file that is block compressed? I am using the Snappy compressor. I realize I can obviously just decompress them to temporary files to get the size, but I would assume there is an easier way. Perhaps an existing tool that my search

Re: Any reference for upgrade hadoop from 1.x to 2.2

2013-11-22 Thread Robert Dyer
Thanks Sandy! These seem helpful! MapReduce cluster configuration options have been split into YARN configuration options, which go in yarn-site.xml; and MapReduce configuration options, which go in mapred-site.xml. Many have been given new names to reflect the shift. ... *We’ll follow up with a

Re: Hadoop 2.2.0 MR tasks failing

2013-11-01 Thread Robert Dyer
So does anyone have any ideas how to track this down? Is it perhaps an exception somewhere in an output committer that is being swallowed and not showing up in the logs? On Tue, Oct 22, 2013 at 2:19 AM, Robert Dyer rd...@iastate.edu wrote: The logs for the maps and reduces show nothing useful

Re: Hadoop 2.2.0 MR tasks failing

2013-10-22 Thread Robert Dyer
] org.apache.hadoop.mapred.Task: Task 'attempt_1382415258498_0001_m_14_0' done. On Tue, Oct 22, 2013 at 12:16 AM, Arun C Murthy a...@hortonworks.com wrote: If you follow the links on the web-ui to the logs of the map/reduce tasks, what do you see there? Arun On Oct 21, 2013, at 9:55 PM, Robert Dyer psyb

Hadoop 2.2.0 MR tasks failing

2013-10-21 Thread Robert Dyer
I recently setup a 2.2.0 test cluster. For some reason, all of my MR jobs are failing. The maps and reduces all run to completion, without any errors. Yet the app is marked failed and there is no final output. Any ideas? Application Type: MAPREDUCE State: FINISHED FinalStatus: FAILED

Job status shows 0's for counters

2013-09-03 Thread Robert Dyer
I just noticed the job status for MR jobs tends to show 0's in the Map and Reduce columns but actually shows the totals correctly. I am not sure exactly when this started happening, but this cluster was upgraded from Hadoop 1.0.4 to 1.1.2 and now to 1.2.1. It definitely worked fine on 1.0.4, but

Re: Job status shows 0's for counters

2013-09-03 Thread Robert Dyer
. But it is not fixed by the current release. Thanks, Shinichi (2013/09/03 11:20), Robert Dyer wrote: I just noticed the job status for MR jobs tends to show 0's in the Map and Reduce columns but actually shows the totals correctly. I am not sure exactly when this started happening

Re: Hadoop upgrade

2013-08-09 Thread Robert Dyer
Actually, 1.2.1 is out (and marked stable). I see no reason not to upgrade. http://hadoop.apache.org/docs/r1.2.1/releasenotes.html As far as performance goes, when I upgraded our cluster from 1.0.4 to 1.1.2, our small jobs (that took about 1 min each) were taking about 20-30s less time. So

HDFS edit log NPE

2013-06-04 Thread Robert Dyer
I recently upgraded from 1.0.4 to 1.1.2. Now however my HDFS won't start up. There appears to be something wrong in the edits file. Obviously I can roll back to a previous checkpoint, however it appears checkpointing has been failing for some time and my last check point is over a month old.

Re: About configuring cluster setup

2013-05-14 Thread Robert Dyer
You can, however note that unless you also run a TaskTracker on that node (bad idea) then any blocks that are replicated to this node won't be available as input to MapReduces and you are lowering the odds of having data locality on those blocks. On Tue, May 14, 2013 at 2:01 AM, Ramya S

Re: Query on Cost estimates on Hadoop and Java

2013-04-25 Thread Robert Dyer
It isn't GPL. OpenJDK[1] is GPLv2 with a Classpath Exception[2] (which is important). Read more here: http://programmers.stackexchange.com/questions/52534/can-we-use-java-for-commercial-use Also note that Hadoop[3] is licensed under Apache v2[4]. [1] http://openjdk.java.net/legal/ [2]

Re: Job cleanup

2013-04-17 Thread Robert Dyer
://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobContext.html#getProgressible%28%29 On Sat, Apr 13, 2013 at 2:35 PM, Robert Dyer psyb...@gmail.com wrote: What does the job cleanup task do? My understanding was it just cleaned up any intermediate/temporary files and moved the reducer output

Job cleanup

2013-04-13 Thread Robert Dyer
What does the job cleanup task do? My understanding was it just cleaned up any intermediate/temporary files and moved the reducer output to the output directory? Does it do more? One of my jobs runs, all maps and reduces finish, but then the job cleanup task never finishes. Instead it gets

Re: Slow MR time and high network utilization with all local data

2013-02-25 Thread Robert Dyer
the short circuit. Now I see no network utilization for this job and it runs *much* faster (13 mins instead of 2+ hours)! Problem solved! :-) Thanks Harsh! On Mon, Feb 25, 2013 at 1:41 AM, Robert Dyer rd...@iastate.edu wrote: I am using Ganglia. Note I have short circuit reads enabled (I think

Slow MR time and high network utilization with all local data

2013-02-24 Thread Robert Dyer
I have a small 6 node dev cluster. I use a 1GB SequenceFile as input to a MapReduce job, using a custom split size of 10MB (to increase the number of maps). Each map call will read random entries out of a shared MapFile (that is around 50GB). I set replication to 6 on both of these files, so

Re: Slow MR time and high network utilization with all local data

2013-02-24 Thread Robert Dyer
over a local socket as well, and may appear in network traffic observing tools too (but do not mean they are over the network). On Mon, Feb 25, 2013 at 2:35 AM, Robert Dyer psyb...@gmail.com wrote: I have a small 6 node dev cluster. I use a 1GB SequenceFile as input to a MapReduce job, using

Re: Namenode failures

2013-02-17 Thread Robert Dyer
? Is there an easy way to monitor (other than a script grep'ing the logs) the checkpoints to see when this happens? On Sat, Feb 16, 2013 at 2:39 PM, Robert Dyer psyb...@gmail.com wrote: Forgot to mention: Hadoop 1.0.4 On Sat, Feb 16, 2013 at 2:38 PM, Robert Dyer psyb...@gmail.com wrote: I am at a bit

Re: Namenode failures

2013-02-17 Thread Robert Dyer
https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Feb 18, 2013 at 3:31 AM, Robert Dyer psyb...@gmail.com wrote: It just happened again. This was after a fresh format of HDFS/HBase and I am attempting to re-import the (backed up) data. http://pastebin.com/3fsWCNQY So now if I

Re: Namenode failures

2013-02-17 Thread Robert Dyer
Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, Feb 18, 2013 at 3:31 AM, Robert Dyer psyb...@gmail.com wrote: It just happened again. This was after a fresh format of HDFS/HBase and I am attempting to re-import the (backed up) data. http://pastebin.com/3fsWCNQY

Re: Namenode failures

2013-02-17 Thread Robert Dyer
or a regular SIGTERM shutdown? I shut down the NN with 'bin/stop-dfs.sh'. On Mon, Feb 18, 2013 at 4:31 AM, Robert Dyer rd...@iastate.edu wrote: On Sun, Feb 17, 2013 at 4:41 PM, Mohammad Tariq donta...@gmail.com wrote: You can make use of offine image viewer to diagnose the fsimage file

Re: Namenode failures

2013-02-16 Thread Robert Dyer
Forgot to mention: Hadoop 1.0.4 On Sat, Feb 16, 2013 at 2:38 PM, Robert Dyer psyb...@gmail.com wrote: I am at a bit of wits end here. Every single time I restart the namenode, I get this crash: 2013-02-16 14:32:42,616 INFO org.apache.hadoop.hdfs.server.common.Storage: Image file of size

Re: Specific HDFS tasks where is passwordless SSH is necessary

2013-02-05 Thread Robert Dyer
http://jayunit100.blogspot.com -- Robert Dyer rd...@iastate.edu

Re: more reduce tasks

2013-01-03 Thread Robert Dyer
You could create a CustomOutputCommitter and in the commitJob() method simply read in the part-* files and write them out into a single aggregated file. This requires making a CustomOutputFormat class that uses the CustomOutputCommittter and then setting that via

Re: Strange machine behavior

2012-12-10 Thread Robert Dyer
, 2012 at 4:09 PM, Robert Dyer rd...@iastate.edu wrote: Has anyone experienced a TaskTracker/DataNode behaving like the attached image? This was during a MR job (which runs often). Note the extremely high System CPU time. Upon investigating I saw that out of 64GB ram the system had

Re: Strange machine behavior

2012-12-10 Thread Robert Dyer
parameter that controls the minimum size of the free chain, might want to increase that a bit. Also, look into hosting your JVM heap on huge pages, they can't be paged out and will help the JVM perform better too. On Dec 8, 2012, at 6:09 PM, Robert Dyer rd...@iastate.edu wrote: Has anyone

Re: Strange machine behavior

2012-12-08 Thread Robert Dyer
job again. Can you share your logs in pastebin? On Sat 08 Dec 2012 07:09:02 PM CST, Robert Dyer wrote: Has anyone experienced a TaskTracker/DataNode behaving like the attached image? This was during a MR job (which runs often). Note the extremely high System CPU time. Upon investigating I

Re: Reg LZO compression

2012-10-16 Thread Robert Dyer
Hi Manoj, If the data is the same for both tests and the number of mappers is fewer, then each mapper has more (uncompressed) data to process. Thus each mapper should take longer and overall execution time should increase. As a simple example: if your data is 128MB uncompressed it may use 2

Re: How to split a sequence file

2012-09-11 Thread Robert Dyer
If the file is pre-sorted, why not just make multiple sequence files - 1 for each split? Then you don't have to compute InputSplits because the physical files are already split. On Tue, Sep 11, 2012 at 11:00 PM, Harsh J ha...@cloudera.com wrote: Hey Jason, Is the file pre-sorted? You could

HBase and MapReduce data locality

2012-08-28 Thread Robert Dyer
I have been reading up on HBase and my understanding is that the physical files on the HDFS are split first by region and then by column families. Thus each column family has its own physical file (on a per-region basis). If I run a MapReduce task that uses the HBase as input, wouldn't this

Updating SequenceFiles?

2012-08-22 Thread Robert Dyer
I am currently using a SequenceFile as input to my MR job (on Hadoop 1.0.3). This works great, as my input is just a bunch of binary blobs. However it seems SequenceFile is only intended to append new data and never update existing entries. Is that correct? If so, would i be better off moving

Re: Jobs randomly not starting

2012-07-17 Thread Robert Dyer
ID, and send the link across? They shouldn't hang the way you describe. On Fri, Jul 13, 2012 at 9:33 AM, Robert Dyer psyb...@gmail.com wrote: I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2 compute nodes). My input size is a sequence file of around 280mb

Jobs randomly not starting

2012-07-12 Thread Robert Dyer
I'm using Hadoop 1.0.3 on a small cluster (1 namenode, 1 jobtracker, 2 compute nodes). My input size is a sequence file of around 280mb. Generally, my jobs run just fine and all finish in 2-5 minutes. However, quite randomly the jobs refuse to run. They submit and appear when running 'hadoop