My input is csv, of the form
userid, itemid

I set booleanData=true

So my script looks like this:
#!/bin/bash
# --input = hdfs file/dir containing the history to process
# --output = hdfs directory to put output into
# --usersFile = user ids to produce recommendations for
# This will run a co-occurrence algorithm on it
mahoutdir=/home/sreavely/mahout-0.4
mahoutver=0.4-SNAPSHOT
hadoop jar $mahoutdir/mahout-core-$mahoutver.job
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob --input
/user/sreavely/mahout-boolean-enduseraction-input.csv --output
/user/sreavely/mahout-output --usersFile
/user/sreavely/mahout-users-to-recommend-for.txt --booleanData true


Cheers,
Simon
p.s. I also have a dataset with a preference column that I've not tested
with yet.

On Tue, Aug 10, 2010 at 1:38 AM, Sean Owen <[email protected]> wrote:

> I think your input is malformed, what does it look like?
> (But the error could be better.)
>
> On Mon, Aug 9, 2010 at 3:14 PM, Simon Reavely <[email protected]>
> wrote:
> > I built and hacked together 0.4-snapshot from src
> >
> > It now finds the class files - hurrah!
> > However, I now get an ArrayIndexOutOfBoundsException
> >
> >
> > 10/08/09 16:07:14 INFO mapred.JobClient: Task Id :
> > attempt_201005101218_0012_m_000000_2, Status : FAILED
> > java.lang.ArrayIndexOutOfBoundsException: 1
> >        at
> >
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:47)
> >        at
> >
> org.apache.mahout.cf.taste.hadoop.item.ItemIDIndexMapper.map(ItemIDIndexMapper.java:31)
> >        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
> >        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
> >        at org.apache.hadoop.mapred.Child.main(Child.java:170)
> >
> >
> > Looking at the source code, the issue is on the array indexing on tokens
> > below, which seems to be an issue
> > with: TasteHadoopUtils.splitPrefTokens(value.toString());
> >
> >  @Override
> >  protected void map(LongWritable key,
> >                     Text value,
> >                     Context context) throws IOException,
> > InterruptedException {
> >    String[] tokens = TasteHadoopUtils.splitPrefTokens(value.toString());
> >    long itemID = Long.parseLong(tokens[transpose ? 0 : 1]);
> >    int index = TasteHadoopUtils.idToIndex(itemID);
> >    context.write(new VarIntWritable(index), new VarLongWritable(itemID));
> >  }
> >
> > Any ideas? Please note, i suspect that this might be an issue with how I
> > hacked together my package since I can't figure out how to create a
> proper
> > binary release from src.
> >
> > If not, I'm off to the debugger!
> >
> > Cheers,
> > Simon
>


-- 
Simon Reavely
[email protected]

Reply via email to