Re: Classification beginner questions

Hector Yee Fri, 10 Jun 2011 08:09:21 -0700

It's the one with the highest score. the relative score to other classes matter 
more than the absolute value. Especially when you have many classes like you 
have.


Even with logistic regression my personal preference is to use the noLink 
function and use that score.

Sent from my iPad

On Jun 10, 2011, at 12:54 AM, Joscha Feth <[email protected]> wrote:

> Hello fellow Mahouts,
> 
> I am trying to grasp Mahout and generated a very simple (but obviously
> wrong) example which I hoped would help me understand how everything works:
> 
> -- 8< --
> public class OLRTest {
> 
>    private static final int FEATURES = 1;
>    private static final int CATEGORIES = 2;
> 
>    private static final WordValueEncoder ANIMAL_ENCODER = new
> AdaptiveWordValueEncoder(
>            "animal");
> 
>    private static final String[] animals = new String[] { "alligator",
> "ant",
>            "bear", "bee", "bird", "camel", "cat", "cheetah", "chicken",
>            "chimpanzee", "cow", "crocodile", "deer", "dog", "dolphin",
> "duck",
>            "eagle", "elephant", "fish", "fly", "fox", "frog", "giraffe",
>            "goat", "goldfish", "hamster", "hippopotamus", "horse",
> "kangaroo",
>            "kitten", "lion", "lobster", "monkey", "octopus", "owl",
> "panda",
>            "pig", "puppy", "rabbit", "rat", "scorpion", "seal", "shark",
>            "sheep", "snail", "snake", "spider", "squirrel", "tiger",
> "turtle",
>            "wolf", "zebra" };
> 
>    public static void main(String[] args) {
>        final OnlineLogisticRegression algorithm = new
> OnlineLogisticRegression(
>                CATEGORIES, FEATURES, new L1());
> 
>        for (String animal : animals) {
>            algorithm.train(0, generateVector(animal));
>        }
> 
>        algorithm.close();
> 
>        testClassify(algorithm, "lion");
>        testClassify(algorithm, "rabbit");
>        testClassify(algorithm, "xyz");
>        testClassify(algorithm, "something");
>    }
> 
>    private static void testClassify(final OnlineLogisticRegression
> algorithm,
>            final String allegedAnimal) {
>        System.out.println(allegedAnimal
>                + " is an animal with a probability of "
>                + algorithm.classifyScalar(generateVector(allegedAnimal)) *
> 100
>                + "%");
>    }
> 
>    private static Vector generateVector(String animal) {
>        final Vector v = new RandomAccessSparseVector(FEATURES);
>        ANIMAL_ENCODER.addToVector(animal, v);
>        return v;
>    }
> }
> -- 8< --
> 
> The output of running this sample code is:
> -- 8< --
> lion is an animal with a probability of 0.12008121418417145%
> rabbit is an animal with a probability of 0.11720244687895641%
> xyz is an animal with a probability of 0.04192879358244322%
> something is an animal with a probability of 0.04047790610981663%
> -- 8< --
> 
> There were multiple surprising things for me:
> * I would have suspected the probability of "lion" and "rabbit" close to
> 100%
> * I would have suspected the probability of "xyz" and "something" close to
> 0%
> * I would have suspected the probability of "lion" being the same as the one
> for "rabbit"
> * I would have suspected the probability of "xyz" being the same as the one
> for "something"
> 
> I know that the animals sample provided is extremely small, but even when
> training with multiple passes (100, 1000, 10000) it did change the
> probabilities only marginally.
> What am I missing here?
> 
> Thanks very much!
> Joscha Feth

Re: Classification beginner questions

Reply via email to