Hi Jay, this use-case seems to be beyond the scope of Sqoop, which is meant to just transfer data between a structured datastore and Hadoop. Including [email protected] to solicit more opinions.
Regards, Kate On Mon, Apr 22, 2013 at 11:04 PM, jaikumar krishna <[email protected]>wrote: > Thanks Kate, > > My use case ::i am doing to do . > I have two table of inputs Table1 and Table 2 . In Table 1(like master) i > have *"25 lakhs" *records of "*company name , address, city,state ,zip > ,phone nember, fax ,Mailid,company website url* ". > > In Table_2 i have " *5 lakhs"* records of *company name , address, > city,state ,zip ,phone nember, fax ,Mailid,company website url* like > Table1. i want to check Table2 recods match with Table1 for verifying > (whether it's correct or not ). > > Before matching i have to put normalization's like below > > *Company name Normalized _Company > name* > Century Tool & Gage becomes Century Tool and Gage > News-Gazette Printing Co => News Gazette Printing > Punch Networks Inc => Punch Networks > Omni Print Inc => Omni Print > > for Address_1 column > *Address_1 => Address_1_Normalized* > 15 Sproat St => 15 Sproat Street > 1 Preble Rd => 1 Preble Road > 90 Everett Ave => 90 Everett Avenue > > Kindly check for attached excel sheet for* normalization of remaining > fields *..(Both tables normalized before verifying ) > > Then i have some condition for result accuracy by score those entities by > matching > > *1.company name == 100 and (address == 100 or phone number == 100) ) * > * 2. ( company name>=75 and address >=75 and city == 100 and state == > 100 )* > > if any anyone satisfies i can put its verified one. > > in another case > *if company name and phone number did not matched with Table1 which > means i can add it in new entity (which means its not in Ttable1)* > > i have attached sample records of Table1 and table 2 and my current output > (which includes scores of my current process without hadoop takes more and > more time) > > > i hope you understand my usecase. > > The main problem is how can i compare each row having 6 fields (comp > name, city ,street,state ,phone .mailid) with another table and get score > and finally get max... i am totally frustrated. ... > > Thanks, > Jay' > > > On Tue, Apr 23, 2013 at 4:49 AM, Kathleen Ting <[email protected]>wrote: > >> Hi Jay, can you share your use-case behind verifying the table in >> Sqoop rather than in HDFS? Generally speaking, you can verify if the >> table transferred successfully by inspecting the file's contents via >> issuing $ hadoop fs -cat <tablename>/part-m-00000 >> >> You can also verify the return value from the Sqoop command ($ echo >> $?), which should be 0. >> >> Regards, Kate >> >> >> On Monday, April 22, 2013 10:19:20 AM UTC-7, jaikumar krishna wrote: >>> >>> hi, >>> how can i find the table is moved successfully or not in sqoop(not >>> in hdfs) ? >>> >>> Thanks, >>> Jay' >>> >> -- >> >> >> >> > > -- > > > >
