Finding problems is never bad, even if misdiagnosed the first time around.
On Sat, Dec 14, 2013 at 4:05 PM, sam wu <swu5...@gmail.com> wrote: > Hi Ted, > > some more debugging, my previous statement is not correct, please > dis-regards. > There is problem i am sure. I am using InMemeoryMapper, one of the ways to > load data. And I found problem there. > I am going to compare with other approach (partial, Breiman) to see what's > the difference. > > My bad, well It's Saturday ! > > Sam > > > On Sat, Dec 14, 2013 at 1:38 PM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > Can you file a JIRA at https://issues.apache.org/jira/browse/MAHOUT ? > > > > It sounds like you have a test case in mind along with your fix. If you > > could package that work up as a patch file, then it would be much > > appreciated. > > > > > > On Sat, Dec 14, 2013 at 9:24 AM, sam wu <swu5...@gmail.com> wrote: > > > > > Hi, > > > > > > I am using random forest of Mahout. It works well when I don't use > > feature > > > descriptor with Ignore feature ( No I flag). > > > > > > If using Ignore flag, the returned feature value is -1 > > > (for in the code dataset.valueOf(aId, token) return -1). > > > > > > I did some investigation, and found that there some problems in the > > > DataConverter.java > > > > > > source code > > > ------ > > > > > > for (int attr = 0; attr < nball; attr++) { --51 > > > if (ArrayUtils.contains(dataset.getIgnored(), attr)) { > > > continue; // IGNORED > > > } > > > > > > String token = tokens[attr].trim(); > > > > > > if ("?".equals(token)) { > > > // missing value > > > return null; > > > } > > > > > > if (dataset.isNumerical(aId)) { --63 > > > vector.set(aId++, Double.parseDouble(token)); > > > } else { // CATEGORICAL > > > vector.set(aId, dataset.valueOf(aId, token)); --66 > > > aId++; > > > } > > > ------- > > > Let feature descriptor be 9 I N L (Breiman Example) > > > 11 features, 1-9 Ignored, 10th is Numeric, 11th is label variable > > > (Is Breiman example really works based on web instruction ?) > > > > > > line 51 -- attr is #feature, 0-10 > > > aId is filtered feature #, 0-1 ( two non-Ignored features) > > > Problem in line 66 > > > if attr=10, Label feature > > > aId=1 > > > token=true > > > dataset.valueOf(aId, token) return -1 , for current code, CATEGORICAL > > > feature valueOf() kind mixed aId and attr concept. > > > > > > Just by changing line 66 > > > vector.set(aId, dataset.valueOf(aId, token)); --66 > > > to vector.set(aId, dataset.valueOf(attr, token)); > > > not working, because some validation fails (also attr, aId mixture). > > > > > > > > > > > > There might be things that I overlook, just some thoughts. > > > > > > > > > Sam > > > > > >