Re: [ANN] lzo indexing

2010-08-31 Thread Torsten Curdt
Hey Todd, > The existing hadoop-lzo project doesn't use the C code in indexing, though I > think you're right that the classes will fail to initialize if the native > libraries aren't available. Well, it relies on the header reading of the codec. But frankly speaking I missed the fact that the re

[ANN] lzo indexing

2010-08-31 Thread Torsten Curdt
For those people using LZO compression: While I know there is http://github.com/kevinweil/hadoop-lzo The native stuff makes it a bit of a hurdle. Especially if you are just running on Amazon Elastic Map Reduce it's way easier to just run this java-only indexer instead. http://github.com/tcurd

Re: 0.21?

2010-07-17 Thread Torsten Curdt
l-archive.com/common-...@hadoop.apache.org/msg01793.html. > > Thanks, > Tom > > On Thu, Jul 15, 2010 at 9:24 AM, Torsten Curdt wrote: >> Hey folks, >> >> how far along is the 0.21 release? ...I just keep building from the >> branch myself currently. That's a bit of a pain. >> >> cheers >> -- >> Torsetn >> >

0.21?

2010-07-15 Thread Torsten Curdt
Hey folks, how far along is the 0.21 release? ...I just keep building from the branch myself currently. That's a bit of a pain. cheers -- Torsetn

Re: limit of values in reduce phase?

2010-06-22 Thread Torsten Curdt
Cool. Great :) On Tue, Jun 22, 2010 at 07:47, Owen O'Malley wrote: > > On Jun 21, 2010, at 5:14 PM, Torsten Curdt wrote: > >> I was just wondering the other day: >> >> What if the the values for a key that get passed into the reducer do >> not fit into mem

limit of values in reduce phase?

2010-06-21 Thread Torsten Curdt
I was just wondering the other day: What if the the values for a key that get passed into the reducer do not fit into memory? After all a reducer should get all values per key from the whole job. Is the iterator disk backed? cheers -- Torsten

Re: custom Configuration values

2010-06-11 Thread Torsten Curdt
> Job job = new Job(conf); > > and it will work. Indeed it does. But that constructor is deprecated. -- Torsten

custom Configuration values

2010-06-11 Thread Torsten Curdt
I am setting some custom values on my job configuration: Configuration conf = new Configuration(); conf.set("job.time.from", time_from); conf.set("job.time.until", time_until); Cluster cluster = new C

Cannot initialize JVM Metrics with processName

2010-06-10 Thread Torsten Curdt
Hadoop 0.21 using the new API. All working. Then I try to use MultipleOutputs in my reducer: private MultipleOutputs mos; protected void setup(Context context) throws IOException, InterruptedException { mos = new MultipleOutputs(context); } protected String generateFileNa

Re: multiple outputs

2010-06-08 Thread Torsten Curdt
, Amareshwari Sri Ramadasu wrote: > MultipleOutputs is ported to use new api through > http://issues.apache.org/jira/browse/MAPREDUCE-370 > See the discussions on jira and javadoc/testcase as an example on how to use > it. > > Thanks > Amareshwari > > On 6/7/10 8:08 PM, &q

multiple outputs

2010-06-07 Thread Torsten Curdt
I need to emit to different output files from a reducer. The old API had MultipleSequenceFileOutputFormat. Am I missing something or is this gone in the new API? Are there any problems porting this over? Or does it just needs to be done? cheers -- Torsten

Re: number of reducers

2010-06-06 Thread Torsten Curdt
n > > On Mon, Jun 7, 2010 at 1:42 AM, Torsten Curdt wrote: >> >> I see only one. >> >> Could it be that using the LocalJobRunner interferes here? >> >> On Mon, Jun 7, 2010 at 01:31, Eric Sammer wrote: >> > Torsten: >> > >> > T

Re: number of reducers

2010-06-06 Thread Torsten Curdt
6, 2010 at 1:33 PM, Torsten Curdt wrote: >> When I set >> >>  job.setPartitionerClass(MyPartitioner.class); >>  job.setNumReduceTasks(4); >> >> I would expect to see my MyParitioner get called with >> >>  getPartition(key, value, 4) >> &

number of reducers

2010-06-06 Thread Torsten Curdt
When I set job.setPartitionerClass(MyPartitioner.class); job.setNumReduceTasks(4); I would expect to see my MyParitioner get called with getPartition(key, value, 4) but still I see it only get called with 1. If also tried setting conf.set("mapred.map.tasks.speculative.exe

Re: InputSplits in Mapper

2010-06-05 Thread Torsten Curdt
> No, there isn't an api for that. Bummer. > The data is actually available in HDFS, but > it is considered an internal format and in particular has changed > substantially between 0.20 and 0.21/trunk. Na ...I was after an API for this. Since I control the splits from a custom input format, I c

Re: InputSplits in Mapper

2010-06-05 Thread Torsten Curdt
Hey > I don't know if there is a way to get them, but I believe you shouldn't need > to do so.. Each Mapper is created for a split it is supposed to work on and > should not be aware of other splits - that is basically why MapReduce is > such an effective pattern - each map and reduce task can be

InputSplits in Mapper

2010-06-05 Thread Torsten Curdt
I know I can get current InputSplit inside a mapper with InputSplit split = context.getInputSplit(); but is there a way to get a list of all InputSplits? cheers -- Torsten

cumulative counts over time

2010-06-04 Thread Torsten Curdt
Hey folks, I have the following keys/lines as input 2010-03-01 11:56/A -> 1 2010-03-01 11:57/A -> 1 2010-03-01 11:57/A -> 1 2010-03-01 11:57/B -> 1 2010-03-01 11:58/B -> 1 2010-03-01 11:58/A -> 1 2010-03-01 11:59/A -> 1 for each of these lines I do one emit. Similar to the word count exam