Dear mahout users, I am trying to use bayes classifier from mahout distribution 0.7. As input training set, I have a text file in following format: One document per line, first entry on the line is the label (key), rest is the evidence (value = document contents). In mahout 0.5, command trainclassifier used to take directory containing files with above kind of format as input but in mahout 0.7, seqdirectory command needs input directory with one file per document. My training set contains millions of small documents so I am trying to avoid having millions of tiny files on HDFS. Is there an easy way to convert above files into sequence files that could be digestible by seq2sparse command subsequently.
Thanks much ~Sarang
