.
From: Ted Dunning tdunn...@maprtech.com
To: common-user@hadoop.apache.org
Sent: Mon, 11 April, 2011 7:38:04 AM
Subject: Re: Architectural question
The original poster said that there was no common key. Your suggestion
presupposes that such a key exists.
On Sun, Apr 10, 2011 at 4:29
.
From: Ted Dunning tdunn...@maprtech.com
To: common-user@hadoop.apache.org
Sent: Mon, 11 April, 2011 7:38:04 AM
Subject: Re: Architectural question
The original poster said that there was no common key. Your suggestion
presupposes that such a key exists
Hi all,
I have some architectural question.
For my app I have persistent 50 GB data, which stored in HDFS, data is
simple CSV format file.
Also for my app which should be run over this (50 GB) data I have 10 GB
input data also CSV format.
Persistent data and input data don't have commons keys
, oleksiy wrote:
Hi all,
I have some architectural question.
For my app I have persistent 50 GB data, which stored in HDFS, data is
simple CSV format file.
Also for my app which should be run over this (50 GB) data I have 10 GB
input data also CSV format.
Persistent data and input data don't have
There are no subtle ways to deal with quadratic problems like this. They
just don't scale.
Your suggestions are roughly on course. When matching 10GB against 50GB,
the choice of which input to use as input to the mapper depends a lot on how
much you can buffer in memory and how long such a
unbalanced. The logic goes for
intersection of any number of sets. Mark the members with their sets, reduce
over them to see if they belong to every set.
Good luck.
On Apr 10, 2011, at 2:10 PM, oleksiy wrote:
Hi all,
I have some architectural question.
For my app I have persistent 50 GB
entry of the 50 GB file.
Daniel.
On Sun, Apr 10, 2011 at 5:10 PM, oleksiy gayduk.a.s...@mail.ru wrote:
Hi all,
I have some architectural question.
For my app I have persistent 50 GB data, which stored in HDFS, data is
simple CSV format file.
Also for my app which should be run over this (50 GB