On Thu, May 5, 2011 at 11:21 PM, Ted Dunning <ted.dunn...@gmail.com> wrote: > On Thu, May 5, 2011 at 7:48 AM, Xiaobo Gu <guxiaobo1...@gmail.com> wrote: > >> On Thu, May 5, 2011 at 10:40 PM, Stanley Xu <wenhao...@gmail.com> wrote: >> > 1. You could use the command line to add shape as category features, it >> will >> > hash categoryname=value as the feature and set the value as 1.0, it is >> the >> > standard way to convert a category feature to multiple numeric >> > feature(convert to 0/1 feature) >> >> Can we just use "word" type for category predictor variables? >> > > Yes. > > >> > 2. In production mode, don't use csv, you will find most of the time >> spent >> > are on parse the csv data and hash them to features. You might encode the >> > feature to vector and serialize them to the file system by MapReduce to >> > reduce cost on data parsing. >> >> Currentlly we are not familiar with Vectors, is there a standard way >> (command line )to encode csv files into Vector and serialize them into >> file system, >> > > There isn't a good command line for this, largely because it is difficult to > describe how to convert each CSV field. There is some beginnings of efforts > on this, but the results are still limit. > > >> And what do you mean by "file system", local file system or HDFS, >> because you mentioned MapReduce
How can I specify a HDFS URI for the --input option ? > > That shouldn't much matter. >