Hi Nitin,

No offense taken. Thank you for your response. Part of this is also trying
to find the right tool for the job.

I am doing queries to determine the cuts of tweets that I want, then doing
some modest normalization (through a python script) and then I want to
create sequenceFiles from that.

So far Hive seems to be the most convenient way to do this. But I can take
a look at PIG too. It looked like the "STORED AS SEQUENCEFILE" gets me 99%
way there. So I was wondering if there was a way to get those ids in there
as well. The last piece is always the stumbler :)

Thanks again,

S




On Mon, Sep 30, 2013 at 2:41 PM, Nitin Pawar <nitinpawar...@gmail.com>wrote:

> are you using hive to just convert your text files to sequence files?
> If thats the case then you may want to look at the purpose why hive was
> developed.
>
> If you want to modify data or process data which does not involve any kind
> of analytics functions on a routine basis.
>
> If you want to do a data manipulation or enrichment and do not want to
> code a lot of map reduce job, you can take a look at pig scripts.
> basically what you want to do is generate an  UUID for each of your tweet
> and then feed it to mahout algorithms.
>
> Sorry if I understood it wrong or it sounds rude.
>

Reply via email to