[Including [email protected]]
On Tue, Apr 23, 2013 at 12:21 PM, Kathleen Ting <[email protected]> wrote: > Hi Jay, this use-case seems to be beyond the scope of Sqoop, which is > meant to just transfer data between a structured datastore and > Hadoop. Including [email protected] to solicit more opinions. > > Regards, Kate > > > On Mon, Apr 22, 2013 at 11:04 PM, jaikumar krishna > <[email protected]>wrote: > >> Thanks Kate, >> >> My use case ::i am doing to do . >> I have two table of inputs Table1 and Table 2 . In Table 1(like master) >> i have *"25 lakhs" *records of "*company name , address, city,state >> ,zip ,phone nember, fax ,Mailid,company website url* ". >> >> In Table_2 i have " *5 lakhs"* records of *company name , address, >> city,state ,zip ,phone nember, fax ,Mailid,company website url* like >> Table1. i want to check Table2 recods match with Table1 for verifying >> (whether it's correct or not ). >> >> Before matching i have to put normalization's like below >> >> *Company name Normalized _Company >> name* >> Century Tool & Gage becomes Century Tool and Gage >> News-Gazette Printing Co => News Gazette Printing >> Punch Networks Inc => Punch Networks >> Omni Print Inc => Omni Print >> >> for Address_1 column >> *Address_1 => Address_1_Normalized* >> 15 Sproat St => 15 Sproat Street >> 1 Preble Rd => 1 Preble Road >> 90 Everett Ave => 90 Everett Avenue >> >> Kindly check for attached excel sheet for* normalization of remaining >> fields *..(Both tables normalized before verifying ) >> >> Then i have some condition for result accuracy by score those entities by >> matching >> >> *1.company name == 100 and (address == 100 or phone number == 100) ) * >> * 2. ( company name>=75 and address >=75 and city == 100 and state >> == 100 )* >> >> if any anyone satisfies i can put its verified one. >> >> in another case >> *if company name and phone number did not matched with Table1 which >> means i can add it in new entity (which means its not in Ttable1)* >> >> i have attached sample records of Table1 and table 2 and my current >> output (which includes scores of my current process without hadoop takes >> more and more time) >> >> >> i hope you understand my usecase. >> >> The main problem is how can i compare each row having 6 fields (comp >> name, city ,street,state ,phone .mailid) with another table and get score >> and finally get max... i am totally frustrated. ... >> >> Thanks, >> Jay' >> >> >> On Tue, Apr 23, 2013 at 4:49 AM, Kathleen Ting <[email protected]>wrote: >> >>> Hi Jay, can you share your use-case behind verifying the table in >>> Sqoop rather than in HDFS? Generally speaking, you can verify if the >>> table transferred successfully by inspecting the file's contents via >>> issuing $ hadoop fs -cat <tablename>/part-m-00000 >>> >>> You can also verify the return value from the Sqoop command ($ echo >>> $?), which should be 0. >>> >>> Regards, Kate >>> >>> >>> On Monday, April 22, 2013 10:19:20 AM UTC-7, jaikumar krishna wrote: >>>> >>>> hi, >>>> how can i find the table is moved successfully or not in sqoop(not >>>> in hdfs) ? >>>> >>>> Thanks, >>>> Jay' >>>> >>> -- >>> >>> >>> >>> >> >> -- >> >> >> >> > >
