Re: Hadoop MRUnit

2012-08-06 Thread Jim Donofrio
I think you mean does MRUnit support the old mapred api. cdh3 includes both the old mapred api and new mapreduce api. Yes MRUnit supports both mapred and mapreduce. Use the classes in org.apache.hadoop.mrunit.{MapDriver, ReduceDriver, MapReduceDriver} for the old mapred api and the classes in

Re: Exec hadoop from Java, reuse JVM (client-side)?

2012-08-01 Thread Jim Donofrio
Why would you call the hadoop script, why not just call the part of the hadoop shell api you are trying to call directly from java? On 08/01/2012 07:37 PM, Keith Wiley wrote: Hmmm, at first glance that does appear to be similar to my situation. I'll have to delve through it in detail to see

how to increment counters inside of InputFormat/RecordReader in mapreduce api?

2012-07-29 Thread Jim Donofrio
In the mapred api getRecordReader was passed a Reporter which could then get passed to the RecordReader to allow a RecordReader to increment counters for different types of records, bad records, etc. In the new mapreduce api createRecordReader only gets the InputSplit and TaskAttemptContext, bo

int read(byte buf[], int off, int len) violates api level contract when length is 0 at the end of a stream

2012-07-23 Thread Jim Donofrio
api contract on java public int read(byte[] buffer[], int off, int len): If len is zero, then no bytes are read and 0 is returned; otherwise, there is an attempt to read at least one byte. If no byte is available because the stream is at end of file, the value -1 is returned; otherwise, at leas

mapreduce.job.max.split.locations just a warning in hadoop 1.0.3 but not in 2.0.1-alpha?

2012-06-05 Thread Jim Donofrio
final int max_loc = conf.getInt(MAX_SPLIT_LOCATIONS, 10); if (locations.length > max_loc) { LOG.warn("Max block location exceeded for split: " + split + " splitsize: " + locations.length + " maxsize: " + max_loc); locations = Arrays.c

how to rebalance individual data node?

2012-05-18 Thread Jim Donofrio
Lets say that every node in your cluster has 2 same sized disks and one is 50% full and the other is 100% full. According to my understanding of the balancer documentation, all data nodes will be at the average utilization of 75% so no balancing will occur yet one hard drive in each node is str

can "HADOOP-6546: BloomMapFile can return false negatives" get backported to branch-1?

2012-05-07 Thread Jim Donofrio
Can someone backport HADOOP-6546: BloomMapFile can return false negatives to branch-1 for the next 1+ release? Without this fix BloomMapFile is somewhat useless because having no false negatives is a core feature of BloomFilters. I am surprised that both hadoop 1.0.2 and cdh3u3 do not have thi

Re: cannot use a map side join to merge the output of multiple map side joins

2012-05-07 Thread Jim Donofrio
Bobby Evans On 5/5/12 10:50 AM, "Jim Donofrio" wrote: I am trying to use a map side join to merge the output of multiple map side joins. This is failing because of the below code in JobClient.writeOldSplits which reorders the splits from largest to smallest. Why is that done, is that so t

cannot use a map side join to merge the output of multiple map side joins

2012-05-05 Thread Jim Donofrio
I am trying to use a map side join to merge the output of multiple map side joins. This is failing because of the below code in JobClient.writeOldSplits which reorders the splits from largest to smallest. Why is that done, is that so that the largest split which will take the longest gets proce

Re: why does Text.setCapacity not double the array size as in most dynamic array implementations?

2012-04-24 Thread Jim Donofrio
Sorry, I just stumbled across HADOOP-6109 which made this change in trunk, I was looking at the Text in 1.0.2. Cant get this fix get backported to the Hadoop 1 versions? On 04/24/2012 11:01 PM, Jim Donofrio wrote: private void setCapacity(int len, boolean keepData) { if (bytes == null

why does Text.setCapacity not double the array size as in most dynamic array implementations?

2012-04-24 Thread Jim Donofrio
private void setCapacity(int len, boolean keepData) { if (bytes == null || bytes.length < len) { byte[] newBytes = new byte[len]; if (bytes != null && keepData) { System.arraycopy(bytes, 0, newBytes, 0, length); } bytes = newBytes; } } Why does Text.set