Re: Architectural question

2011-04-11 Thread sumit ghosh
. From: Ted Dunning tdunn...@maprtech.com To: common-user@hadoop.apache.org Sent: Mon, 11 April, 2011 7:38:04 AM Subject: Re: Architectural question The original poster said that there was no common key. Your suggestion presupposes that such a key exists. On Sun, Apr 10, 2011 at 4:29

Re: Architectural question

2011-04-11 Thread Mehmet Tepedelenlioglu
. From: Ted Dunning tdunn...@maprtech.com To: common-user@hadoop.apache.org Sent: Mon, 11 April, 2011 7:38:04 AM Subject: Re: Architectural question The original poster said that there was no common key. Your suggestion presupposes that such a key exists

Architectural question

2011-04-10 Thread oleksiy
Hi all, I have some architectural question. For my app I have persistent 50 GB data, which stored in HDFS, data is simple CSV format file. Also for my app which should be run over this (50 GB) data I have 10 GB input data also CSV format. Persistent data and input data don't have commons keys

Re: Architectural question

2011-04-10 Thread Mehmet Tepedelenlioglu
, oleksiy wrote: Hi all, I have some architectural question. For my app I have persistent 50 GB data, which stored in HDFS, data is simple CSV format file. Also for my app which should be run over this (50 GB) data I have 10 GB input data also CSV format. Persistent data and input data don't have

Re: Architectural question

2011-04-10 Thread Ted Dunning
There are no subtle ways to deal with quadratic problems like this. They just don't scale. Your suggestions are roughly on course. When matching 10GB against 50GB, the choice of which input to use as input to the mapper depends a lot on how much you can buffer in memory and how long such a

Re: Architectural question

2011-04-10 Thread Ted Dunning
unbalanced. The logic goes for intersection of any number of sets. Mark the members with their sets, reduce over them to see if they belong to every set. Good luck. On Apr 10, 2011, at 2:10 PM, oleksiy wrote: Hi all, I have some architectural question. For my app I have persistent 50 GB

Re: Architectural question

2011-04-10 Thread Daniel McEnnis
entry of the 50 GB file. Daniel. On Sun, Apr 10, 2011 at 5:10 PM, oleksiy gayduk.a.s...@mail.ru wrote: Hi all, I have some architectural question. For my app I have persistent 50 GB data, which stored in HDFS, data is simple CSV format file. Also for my app which should be run over this (50 GB