Hi, I have done classification using mahout. suppose a file named Testfile has size of 20 Mb and test contents are as below
*category id description* sports xxx cricket,football etc sports xxxx cricket,vollyball etc news yyyy poltical,etc news ppppp news channel Now in above file, we want to do text categorization( suppose we have 2 category sports and news) suppose our 60% data consists of first 4 lines of Testfile and 40% consists of last line. Than if I want to use 60% as trained data and 40 % as test data than mahout will train with first 4 line and will make a binary model. Now while testing it will remove the category from the last line(i.e 40% of file) and will input this file to model to test. so that , the result category can be compared with the actual file and efficiency of algorithm can be evaluated. I think same applied to your case too. On Tue, Nov 11, 2014 at 11:58 AM, Blade Liu <hafzc...@gmail.com> wrote: > Hi, > > I'm new to Mahout and got confused how train and test data are split when > evaluating recommenders. > > I'm not sure whether data is split based on selecting partial item > preferences, or selecting specific users(together with all their > preferences). For example, train data accounts for 60%, and test data > accounts for 40%. Does it indicates 40% total preferences will used for > testing(regardless associated users)? In classification, all features > associated with the users will be selected.. > > If partition criteria is based on preference, would it affect neighborhood > similarity before computing recommended score? > > > Cheers, > Blade >