A full sort is not usually feasible/desirable.
Better to just keep a pool of samples and replace random samples.
On Thu, Jun 16, 2011 at 2:41 AM, Lance Norskog wrote:
> Use a crypto-hash on the base data as the sorting key. The base data
> is the value (payload). That should randomly permute th
Use a crypto-hash on the base data as the sorting key. The base data
is the value (payload). That should randomly permute things.
On Wed, Jun 15, 2011 at 2:50 PM, Ted Dunning wrote:
> It is already in Mahout, I think.
>
> On Tue, Jun 14, 2011 at 5:48 AM, Lance Norskog wrote:
>
>> Coding a permut
It is already in Mahout, I think.
On Tue, Jun 14, 2011 at 5:48 AM, Lance Norskog wrote:
> Coding a permutation like this in Map/Reduce is a good beginner exercise.
>
> On Sun, Jun 12, 2011 at 11:34 PM, Ted Dunning
> wrote:
> > But the key is that you have to have both kinds of samples. Moreove
Coding a permutation like this in Map/Reduce is a good beginner exercise.
On Sun, Jun 12, 2011 at 11:34 PM, Ted Dunning wrote:
> But the key is that you have to have both kinds of samples. Moreover,
> for all of the stochastic gradient descent work, you need to have them
> in a random-ish order.
But the key is that you have to have both kinds of samples. Moreover,
for all of the stochastic gradient descent work, you need to have them
in a random-ish order. You can't show all of one category and then
all of another. It is even worse if you sort your data.
On Mon, Jun 13, 2011 at 5:35 AM
If you have a much larger background set you can try online passive
aggressive in mahout 0.6 as it uses hinge loss and does not update the model
of it gets things correct. Log loss will always have a gradient in
contrast.
On Jun 12, 2011 7:54 AM, "Joscha Feth" wrote:
> Hi Ted,
>
> I see. Only for
An infinite number of samples is fine.
It is still true that you need to have training samples from all of
the target categories.
On Sun, Jun 12, 2011 at 2:53 PM, Joscha Feth wrote:
> Hi Ted,
>
> I see. Only for the OLR or also for any other algorithm? What if my
> other category theoretically c
Hi Ted,
I see. Only for the OLR or also for any other algorithm? What if my
other category theoretically contains an infinite number of samples?
Cheers,
Joscha
Am 12.06.2011 um 15:08 schrieb Ted Dunning :
> Joscha,
>
> There is no implicit training. you need to give negative examples as
> well
Joscha,
There is no implicit training. you need to give negative examples as
well as positive.
On Sat, Jun 11, 2011 at 9:08 AM, Joscha Feth wrote:
> Hello Ted,
>
> thanks for your response!
> What I wanted to accomplish is actually quite simple in theory: I have some
> sentences which have thi
Hello Ted,
thanks for your response!
What I wanted to accomplish is actually quite simple in theory: I have some
sentences which have things in common (like some similar words for example).
I want to train my model with these example sentences I have. Once it is
trained I want to give an unknown s
Hello Sebastian,
Thanks for the hint, I did get the MEAP edition of the ebook already through
manning, however I find myself struggling to translate the newsgroup and
wikipedia examples to my usecase. Especially I can't seem to be able to find
any code examples which helps me with the generation o
Hector, thank you very much for youir response, I adapted my example:
-- 8< --
public class OLRTest {
private static final String[] animals = new String[] { "alligator",
"ant",
"bear", "bee", "bird", "camel", "cat", "cheetah", "chicken",
"chimpanzee", "cow", "crocodile
The target variable here is always zero.
Shouldn't it vary?
On Fri, Jun 10, 2011 at 9:54 AM, Joscha Feth wrote:
> algorithm.train(0, generateVector(animal));
>
Hi Joscha,
If you have some money left, I'd recommend to get a copy of Mahout in
Action, which features a very nice to read, detailed introduction to
classification with Mahout, including strategies for feature selection.
--sebastian
On 10.06.2011 17:28, Hector Yee wrote:
Oh you have a very
Oh you have a very strange feature, you are using the label as a feature, may
bad. I thought the words were the labels.
Usually it's something like weight, height, something meaningful. If it's just
the label like you have you might as well use a hash map there is no feature to
learn! But if you
It's the one with the highest score. the relative score to other classes matter
more than the absolute value. Especially when you have many classes like you
have.
Even with logistic regression my personal preference is to use the noLink
function and use that score.
Sent from my iPad
On Jun 10
Hello fellow Mahouts,
I am trying to grasp Mahout and generated a very simple (but obviously
wrong) example which I hoped would help me understand how everything works:
-- 8< --
public class OLRTest {
private static final int FEATURES = 1;
private static final int CATEGORIES = 2;
pr
17 matches
Mail list logo