I don't think the problem is due to the sample size, if you use 2 features,
10 samples might be OK.

The problem is that you didn't include bias(intercept) term, which is
always 1.
If you add bias term(1) to your point class, you'll get 0.9999 for cluster
0 probability.


Sam

On Wed, Jun 27, 2012 at 1:23 AM, Sean Owen <sro...@gmail.com> wrote:

> Those are both true; they may not be the issue here.
>
> The test point definitely belongs in the first of the two groups you
> created. Why is the result surprising?
>
> On Wed, Jun 27, 2012 at 9:15 AM, Lance Norskog <goks...@gmail.com> wrote:
>
> > Not enough samples. Machine learning algorithms in general do well if
> > you have large sample sets (hundreds or thousands) from "real" data
> > sources. The data should have a strong signal but be a little noisy.
> >
> > Also: your Point class needs a hashCode() since it does equals(). The
> > Map class won't work at scale.
> >
> > On Wed, Jun 27, 2012 at 1:00 AM, damodar shetyo <akshay.she...@gmail.com
> >
> > wrote:
> > > I am trying to build a simple model that can group points in 2D
> space.Am
> > > training the model by giving few examples.After that i am using the
> model
> > > to predict the group in which the any other points may fall.But am not
> > > getting answer as expected.Am i missing something in my code or am i
> > doing
> > > something wrong?
> > >
> > >       public class SimpleClassifier {
> > >
> > >    public static class Point{
> > >        public int x;
> > >        public int y;
> > >
> > >        public Point(int x,int y){
> > >            this.x = x;
> > >            this.y = y;
> > >        }
> > >
> > >        @Override
> > >        public boolean equals(Object arg0) {
> > >            Point p = (Point)  arg0;
> > >            return( (this.x == p.x) &&(this.y== p.y));
> > >        }
> > >
> > >        @Override
> > >        public String toString() {
> > >            // TODO Auto-generated method stub
> > >            return  this.x + " , " + this.y ;
> > >        }
> > >    }
> > >    public static void main(String[] args) {
> > >
> > >        Map<Point,Integer> points = new HashMap<SimpleClassifier.Point,
> > > Integer>();
> > >
> > >        points.put(new Point(0,0), 0);
> > >        points.put(new Point(1,1), 0);
> > >        points.put(new Point(1,0), 0);
> > >        points.put(new Point(0,1), 0);
> > >        points.put(new Point(2,2), 0);
> > >
> > >
> > >        points.put(new Point(8,8), 1);
> > >        points.put(new Point(8,9), 1);
> > >        points.put(new Point(9,8), 1);
> > >        points.put(new Point(9,9), 1);
> > >
> > >
> > >        OnlineLogisticRegression learningAlgo = new
> > > OnlineLogisticRegression();
> > >        learningAlgo =  new OnlineLogisticRegression(2, 2, new L1());
> > >        learningAlgo.learningRate(50);
> > >
> > >        //learningAlgo.alpha(1).stepOffset(1000);
> > >
> > >        System.out.println("training model  \n" );
> > >        for(Point point : points.keySet()){
> > >            Vector v = getVector(point);
> > >            System.out.println(point  + " belongs to " +
> > points.get(point));
> > >            learningAlgo.train(points.get(point), v);
> > >        }
> > >
> > >        learningAlgo.close();
> > >
> > >
> > >        //now classify real data
> > >        Vector v = new RandomAccessSparseVector(2);
> > >        v.set(0, 0.5);
> > >        v.set(1, 0.5);
> > >
> > >        Vector r = learningAlgo.classifyFull(v);
> > >        System.out.println(r);
> > >
> > >        System.out.println("ans = " );
> > >        System.out.println("no of categories = " +
> > > learningAlgo.numCategories());
> > >        System.out.println("no of features = " +
> > > learningAlgo.numFeatures());
> > >        System.out.println("Probability of cluster 0 = " + r.get(0));
> > >        System.out.println("Probability of cluster 1 = " + r.get(1));
> > >
> > >    }
> > >
> > >    public static Vector getVector(Point point){
> > >        Vector v = new DenseVector(2);
> > >        v.set(0, point.x);
> > >        v.set(1, point.y);
> > >
> > >        return v;
> > >    }
> > > }
> > >
> > > OP
> > > ans =
> > > no of categories = 2
> > > no of features = 2
> > > Probability of cluster 0 = 3.9580985042775296E-4
> > > Probability of cluster 1 = 0.9996041901495722
> > >
> > > 99 % of times the output show more probability for cluster 1.Why?
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Damodar Shetyo
> >
> >
> >
> > --
> > Lance Norskog
> > goks...@gmail.com
> >
>

Reply via email to