Re: Dataset comparison and ranking - views

2011-03-07 Thread Sonal Goyal
Hi Marcos, Thanks for replying. I think I was not very clear in my last post. Let me describe my use case in detail. I have two datasets coming from different sources, lets call them dataset1 and dataset2. Both of them contain records for entities, say Person. A single record looks like: First N

how to use hadoop apis with cloudera distribution ?

2011-03-07 Thread Mapred Learn
Hi, I downloaded CDH3 VM for hadoop but if I want to use something like: import org.apache.hadoop.conf.Configuration; in my java code, what else do I need to do ? Do i need to download hadoop from apache ? if yes, then what does cdh3 do ? if not, then where can i find hadoop code on cdh VM ?

Re: Sequence File usage queries

2011-03-07 Thread David Rosenstrauch
On 02/23/2011 07:24 PM, Mapred Learn wrote: Thanks ! In this case, how can we print the metadata associated with the data (sequence files), if user accessing this data wants to know it: i) Is there any hadoop command that can do it ? ii) Or we will have to provide some interface to the user to s

Re: Dataset comparison and ranking - views

2011-03-07 Thread Marcos Ortiz
On Tue, 2011-03-08 at 00:36 +0530, Sonal Goyal wrote: > Hi, > > I am working on a problem to compare two different datasets, and rank > each record of the first with respect to the other, in terms of how > similar they are. The records are dimensional, but do not have a lot > of dimensions. Some o

Dataset comparison and ranking - views

2011-03-07 Thread Sonal Goyal
Hi, I am working on a problem to compare two different datasets, and rank each record of the first with respect to the other, in terms of how similar they are. The records are dimensional, but do not have a lot of dimensions. Some of the fields will be compared for exact matches, some for similar