Hi,

I have a lot of tweets saved as text. I created an external table on top of
it to access it as textfile. I need to convert these to sequencefiles with
each tweet as its own record. To do this, I created another table as a
sequencefile table like so -

CREATE EXTERNAL TABLE tweetseq(
  tweet STRING
  )
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
 STORED AS SEQUENCEFILE
LOCATION '/user/hdfs/tweetseq'


Now when I insert into this table from my original tweets table, each line
gets its own record as expected. This is great. However, I don't have any
record ids here. How can I get it to write ids?

PS, I need the ids to be there because mahout seq2sparse expects that.

Regards,
S

Reply via email to