Hey Todd,
> The existing hadoop-lzo project doesn't use the C code in indexing, though I
> think you're right that the classes will fail to initialize if the native
> libraries aren't available.
Well, it relies on the header reading of the codec.
But frankly speaking I missed the fact that the re
For those people using LZO compression:
While I know there is
http://github.com/kevinweil/hadoop-lzo
The native stuff makes it a bit of a hurdle. Especially if you are
just running on Amazon Elastic Map Reduce it's way easier to just run
this java-only indexer instead.
http://github.com/tcurd
l-archive.com/common-...@hadoop.apache.org/msg01793.html.
>
> Thanks,
> Tom
>
> On Thu, Jul 15, 2010 at 9:24 AM, Torsten Curdt wrote:
>> Hey folks,
>>
>> how far along is the 0.21 release? ...I just keep building from the
>> branch myself currently. That's a bit of a pain.
>>
>> cheers
>> --
>> Torsetn
>>
>
Hey folks,
how far along is the 0.21 release? ...I just keep building from the
branch myself currently. That's a bit of a pain.
cheers
--
Torsetn
Cool. Great :)
On Tue, Jun 22, 2010 at 07:47, Owen O'Malley wrote:
>
> On Jun 21, 2010, at 5:14 PM, Torsten Curdt wrote:
>
>> I was just wondering the other day:
>>
>> What if the the values for a key that get passed into the reducer do
>> not fit into mem
I was just wondering the other day:
What if the the values for a key that get passed into the reducer do
not fit into memory?
After all a reducer should get all values per key from the whole job.
Is the iterator disk backed?
cheers
--
Torsten
> Job job = new Job(conf);
>
> and it will work.
Indeed it does. But that constructor is deprecated.
--
Torsten
I am setting some custom values on my job configuration:
Configuration conf = new Configuration();
conf.set("job.time.from", time_from);
conf.set("job.time.until", time_until);
Cluster cluster = new C
Hadoop 0.21 using the new API. All working.
Then I try to use MultipleOutputs in my reducer:
private MultipleOutputs mos;
protected void setup(Context context) throws IOException,
InterruptedException {
mos = new MultipleOutputs(context);
}
protected String generateFileNa
, Amareshwari Sri Ramadasu
wrote:
> MultipleOutputs is ported to use new api through
> http://issues.apache.org/jira/browse/MAPREDUCE-370
> See the discussions on jira and javadoc/testcase as an example on how to use
> it.
>
> Thanks
> Amareshwari
>
> On 6/7/10 8:08 PM, &q
I need to emit to different output files from a reducer.
The old API had MultipleSequenceFileOutputFormat.
Am I missing something or is this gone in the new API?
Are there any problems porting this over?
Or does it just needs to be done?
cheers
--
Torsten
n
>
> On Mon, Jun 7, 2010 at 1:42 AM, Torsten Curdt wrote:
>>
>> I see only one.
>>
>> Could it be that using the LocalJobRunner interferes here?
>>
>> On Mon, Jun 7, 2010 at 01:31, Eric Sammer wrote:
>> > Torsten:
>> >
>> > T
6, 2010 at 1:33 PM, Torsten Curdt wrote:
>> When I set
>>
>> job.setPartitionerClass(MyPartitioner.class);
>> job.setNumReduceTasks(4);
>>
>> I would expect to see my MyParitioner get called with
>>
>> getPartition(key, value, 4)
>>
&
When I set
job.setPartitionerClass(MyPartitioner.class);
job.setNumReduceTasks(4);
I would expect to see my MyParitioner get called with
getPartition(key, value, 4)
but still I see it only get called with 1.
If also tried setting
conf.set("mapred.map.tasks.speculative.exe
> No, there isn't an api for that.
Bummer.
> The data is actually available in HDFS, but
> it is considered an internal format and in particular has changed
> substantially between 0.20 and 0.21/trunk.
Na ...I was after an API for this.
Since I control the splits from a custom input format, I c
Hey
> I don't know if there is a way to get them, but I believe you shouldn't need
> to do so.. Each Mapper is created for a split it is supposed to work on and
> should not be aware of other splits - that is basically why MapReduce is
> such an effective pattern - each map and reduce task can be
I know I can get current InputSplit inside a mapper with
InputSplit split = context.getInputSplit();
but is there a way to get a list of all InputSplits?
cheers
--
Torsten
Hey folks,
I have the following keys/lines as input
2010-03-01 11:56/A -> 1
2010-03-01 11:57/A -> 1
2010-03-01 11:57/A -> 1
2010-03-01 11:57/B -> 1
2010-03-01 11:58/B -> 1
2010-03-01 11:58/A -> 1
2010-03-01 11:59/A -> 1
for each of these lines I do one emit. Similar to the word count
exam
18 matches
Mail list logo