No, you might have each review as a record in a much larger SequenceFile.
I don't know whether the current implementation reads input formatted
like this, but if it doesn't, it can't be hard to modify it to do so.
You would not want many many small files on HDFS.

On Mon, Apr 9, 2012 at 9:54 AM, Mohit Anchlia <mohitanch...@gmail.com> wrote:
> Thanks! One thing I am not clear is if each customer review which might be
> just few bytes need to be in separate files? I am planning to utilize
> hadoop so I was thinking of using SequenceFiles to dump all the raw
> comments in a sequenceFile but I am not sure if it would mess up any TFDF
> or anything like that. Could someone help me clarify?
>

Reply via email to