On Thu, May 5, 2011 at 11:21 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> On Thu, May 5, 2011 at 7:48 AM, Xiaobo Gu <guxiaobo1...@gmail.com> wrote:
>
>> On Thu, May 5, 2011 at 10:40 PM, Stanley Xu <wenhao...@gmail.com> wrote:
>> > 1. You could use the command line to add shape as category features, it
>> will
>> > hash categoryname=value as the feature and set the value as 1.0, it is
>> the
>> > standard way to convert a category feature to multiple numeric
>> > feature(convert to 0/1 feature)
>>
>> Can we just use "word" type for category predictor variables?
>>
>
> Yes.
>
>
>> > 2. In production mode, don't use csv, you will find most of the time
>> spent
>> > are on parse the csv data and hash them to features. You might encode the
>> > feature to vector and serialize them to the file system by MapReduce to
>> > reduce cost on data parsing.
>>
>> Currentlly we are not familiar with Vectors, is there a standard way
>> (command line )to encode csv files into Vector and serialize them into
>> file system,
>>
>
> There isn't a good command line for this, largely because it is difficult to
> describe how to convert each CSV field.  There is some beginnings of efforts
> on this, but the results are still limit.
>
>
>> And what do you mean by "file system", local file system or HDFS,
>> because you mentioned MapReduce

How can I specify a HDFS URI for the --input option ?

>
> That shouldn't much matter.
>

Reply via email to