Hello I have this problem to solve using Pig.
*Input* 1. Relation A which has only one field of type chararray. Sample of A follows: *abc* *xyz gh* *zzz yy* *red* Approximate numbers of rows in A = 10,000 2. Relation B which has only one field of type chararray. Sample of B follows: *red car* *red ferrari* *abc* *abcd* *xyz ghis* Approximate numbers of rows in B = 1 billion *Problem to be solved* I need to find all case-insensitive variants of each term in relation A existing in relation B. For example: Term 'red' from A would have variants 'red car' and 'red ferrari' in B. I was able to get variants of one term in A from B using matches operator i.e. matches '.*red.*' How to go about creating a complete solution for this problem? Should I use a UDF or go for native Map Reduce? Am a bit confused on how to proceed on this. I would really appreciate any help on this. Thanks much. Regards Arun A K