Hi Nitin, No offense taken. Thank you for your response. Part of this is also trying to find the right tool for the job.
I am doing queries to determine the cuts of tweets that I want, then doing some modest normalization (through a python script) and then I want to create sequenceFiles from that. So far Hive seems to be the most convenient way to do this. But I can take a look at PIG too. It looked like the "STORED AS SEQUENCEFILE" gets me 99% way there. So I was wondering if there was a way to get those ids in there as well. The last piece is always the stumbler :) Thanks again, S On Mon, Sep 30, 2013 at 2:41 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote: > are you using hive to just convert your text files to sequence files? > If thats the case then you may want to look at the purpose why hive was > developed. > > If you want to modify data or process data which does not involve any kind > of analytics functions on a routine basis. > > If you want to do a data manipulation or enrichment and do not want to > code a lot of map reduce job, you can take a look at pig scripts. > basically what you want to do is generate an UUID for each of your tweet > and then feed it to mahout algorithms. > > Sorry if I understood it wrong or it sounds rude. >