Re: Confused about train/test data split in recommender evaluation

jyotiranjan panda Mon, 10 Nov 2014 22:49:51 -0800

Hi,
I have done classification using mahout.
suppose a file named Testfile has size of 20 Mb and test contents are as
below

*category  id                  description*
sports      xxx                cricket,football etc
sports      xxxx                cricket,vollyball etc
news       yyyy                poltical,etc
news       ppppp              news channel

Now in above file, we want to do text categorization( suppose we have 2
category sports and news)
suppose our 60% data consists of first 4 lines of Testfile and 40% consists
of last line.
Than if I want to use 60% as trained data and 40 % as test data than mahout
will train with first 4 line and will make a binary model.
Now while testing it will remove the category from the last line(i.e 40% of
file) and will input this file to model to test.

so that , the result category can be compared with the actual file and
efficiency of algorithm can be evaluated.

I think same applied to your case too.

On Tue, Nov 11, 2014 at 11:58 AM, Blade Liu <hafzc...@gmail.com> wrote:

> Hi,
>
> I'm new to Mahout and got confused how train and test data are split when
> evaluating recommenders.
>
> I'm not sure whether data is split based on selecting partial item
> preferences, or selecting specific users(together with all their
> preferences). For example, train data accounts for 60%, and test data
> accounts for 40%. Does it indicates 40% total preferences will used for
> testing(regardless associated users)?  In classification, all features
> associated with the users will be selected..
>
> If partition criteria is based on preference, would it affect neighborhood
> similarity before computing recommended score?
>
>
> Cheers,
> Blade
>

Re: Confused about train/test data split in recommender evaluation

Reply via email to