Hi, I have a lot of tweets saved as text. I created an external table on top of it to access it as textfile. I need to convert these to sequencefiles with each tweet as its own record. To do this, I created another table as a sequencefile table like so -
CREATE EXTERNAL TABLE tweetseq( tweet STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054' STORED AS SEQUENCEFILE LOCATION '/user/hdfs/tweetseq' Now when I insert into this table from my original tweets table, each line gets its own record as expected. This is great. However, I don't have any record ids here. How can I get it to write ids? PS, I need the ids to be there because mahout seq2sparse expects that. Regards, S
