Re: SF Informal meetup on May 23?

2011-05-29 Thread Isabel Drost
On 30.05.2011 Josh Patterson wrote: > Likewise, we need to do it again around Hadoop Summit time. Any interest in having something similar around Berlin Buzzwords next week in Berlin? Isabel signature.asc Description: This is a digitally signed message part.

[jira] [Commented] (MAHOUT-709) FP-Growth Redundant patterns

2011-05-29 Thread jinyongbo (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041017#comment-13041017 ] jinyongbo commented on MAHOUT-709: -- Thanks all. > FP-Growth Redundant patterns > ---

[jira] [Updated] (MAHOUT-695) Option to determine number of words for LDADriver from a specified dictionary

2011-05-29 Thread Mat Kelcey (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mat Kelcey updated MAHOUT-695: -- Attachment: mahout-695.patch Have removed NUM_WORDS option completely which will break existing callers

[jira] [Updated] (MAHOUT-695) Option to determine number of words for LDADriver from a specified dictionary

2011-05-29 Thread Mat Kelcey (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mat Kelcey updated MAHOUT-695: -- Description: It bugged me that you needed to specify the number of words directly to the LDADriver eg

Re: NullPointerException after getBest() during training a AdaptiveLogisticRegression.

2011-05-29 Thread Ted Dunning
Do you have a test case that demonstrates this? On Sun, May 29, 2011 at 6:53 PM, Xiaobo Gu wrote: > There is a internal buffer in AdaptiveLogisticRegression, the > NullPointerException is caused when the backend crossfloderlearners > starting training the examples. > > The default size of the bu

Re: SF Informal meetup on May 23?

2011-05-29 Thread Ted Dunning
Josh neglects to mention that we talked a bit about his time series work before Cloudera. On Sun, May 29, 2011 at 8:51 PM, Josh Patterson wrote: > we talked about > > - MapR's inclusion of Mahout in their distro > - Time series and data mining, Keogh's work, his work with SAX > - Grant prodded e

Re: SF Informal meetup on May 23?

2011-05-29 Thread Josh Patterson
we talked about - MapR's inclusion of Mahout in their distro - Time series and data mining, Keogh's work, his work with SAX - Grant prodded everyone to contribute more code - Ted educated us on a number of topics, I lost count =) - ideas around MR2 and Mahout, the need for a workflow that uses bot

Re: SF Informal meetup on May 23?

2011-05-29 Thread Josh Patterson
Likewise, we need to do it again around Hadoop Summit time. On Sun, May 29, 2011 at 1:32 PM, Dawid Weiss wrote: > Belated a bit, but i just wanted to say thanks to those that took part in > the meeting. I really enjoyed all the topics covered (mahout related and > otherwise). It was a pleasure.

[jira] [Commented] (MAHOUT-676) Random samplers in a modular library

2011-05-29 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040978#comment-13040978 ] Lance Norskog commented on MAHOUT-676: -- A Poisson join sampler should probably be nex

[jira] [Issue Comment Edited] (MAHOUT-676) Random samplers in a modular library

2011-05-29 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040977#comment-13040977 ] Lance Norskog edited comment on MAHOUT-676 at 5/30/11 2:43 AM: -

[jira] [Updated] (MAHOUT-676) Random samplers in a modular library

2011-05-29 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated MAHOUT-676: - Description: This is a modular suite of samplers. It supplies the ability to throw away samples

[jira] [Commented] (MAHOUT-676) Random samplers in a modular library

2011-05-29 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040977#comment-13040977 ] Lance Norskog commented on MAHOUT-676: -- bq. Normally slice samplers are used in the s

Re: NullPointerException after getBest() during training a AdaptiveLogisticRegression.

2011-05-29 Thread Xiaobo Gu
There is a internal buffer in AdaptiveLogisticRegression, the NullPointerException is caused when the backend crossfloderlearners starting training the examples. The default size of the buffer is 500, and the exception is caused when I put the 501'th example to ALR. On Mon, May 30, 2011 at 2:27

[jira] [Commented] (MAHOUT-668) Adding knn support to Mahout classifiers

2011-05-29 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040873#comment-13040873 ] Ted Dunning commented on MAHOUT-668: So sorry... I didn't look enough at context. I

[jira] [Commented] (MAHOUT-668) Adding knn support to Mahout classifiers

2011-05-29 Thread Daniel McEnnis (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040864#comment-13040864 ] Daniel McEnnis commented on MAHOUT-668: --- Ted, On the contrary. The only method of

Re: SF Informal meetup on May 23?

2011-05-29 Thread Dawid Weiss
Belated a bit, but i just wanted to say thanks to those that took part in the meeting. I really enjoyed all the topics covered (mahout related and otherwise). It was a pleasure. Dawid

Re: NullPointerException after getBest() during training a AdaptiveLogisticRegression.

2011-05-29 Thread Ted Dunning
This usually means that you have fed the ALR enough data for it to push a batch of learning into the evolutionary algorithm. That means that there isn't any best result yet. Getting that null doesn't impact the model, but you have to watch out for it. On Sun, May 29, 2011 at 1:23 AM, XiaoboGu w

Re: Use of Avro

2011-05-29 Thread Ted Dunning
More compact than what? Avro is about as dense as either protobufs or Thrift. I use all three in different settings (Ken K sends me data in Avro which I like because I can poke around in the data using python, MapR uses protobufs all over the place internally, I have written numerous services usi

Re: Use of Avro

2011-05-29 Thread Ted Dunning
That last sentence is the key. How many of us have actually written a good encoder? For instance, our sparse vectors don't use Golomb-delta encoding of indexes. They don't have a special case for binary data. They don't check the stats to see if zig-zag encoding of integers would help with the

Re: Use of Avro

2011-05-29 Thread Ted Dunning
That is much less than the effort of writing the methods required by Writable. And the result is much larger since it gives you data readable from many languages. On Sun, May 29, 2011 at 12:58 AM, Sean Owen wrote: > There is still the mild complexity of declaring a schema. >

[jira] [Commented] (MAHOUT-668) Adding knn support to Mahout classifiers

2011-05-29 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040839#comment-13040839 ] Ted Dunning commented on MAHOUT-668: Daniel, What exactly do you mean by model updat

[jira] [Updated] (MAHOUT-668) Adding knn support to Mahout classifiers

2011-05-29 Thread Daniel McEnnis (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel McEnnis updated MAHOUT-668: -- Attachment: Mahout-668-3.patch I've implemented a parallel training option. I've held off on t

[jira] [Updated] (MAHOUT-668) Adding knn support to Mahout classifiers

2011-05-29 Thread Daniel McEnnis (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel McEnnis updated MAHOUT-668: -- Attachment: Mahout-668-3.patch I created the patch from the wrong tree :-(. > Adding knn suppo

[jira] [Updated] (MAHOUT-714) CollocDriver not runnable with ToolRunner due to private Constructor

2011-05-29 Thread Frank Scholten (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Scholten updated MAHOUT-714: -- Status: Patch Available (was: Open) Removed private constructor in order to use constructor fr

[jira] [Updated] (MAHOUT-714) CollocDriver not runnable with ToolRunner due to private Constructor

2011-05-29 Thread Frank Scholten (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Frank Scholten updated MAHOUT-714: -- Attachment: MAHOUT-714.patch > CollocDriver not runnable with ToolRunner due to private Constru

[jira] [Created] (MAHOUT-714) CollocDriver not runnable with ToolRunner due to private Constructor

2011-05-29 Thread Frank Scholten (JIRA)
CollocDriver not runnable with ToolRunner due to private Constructor Key: MAHOUT-714 URL: https://issues.apache.org/jira/browse/MAHOUT-714 Project: Mahout Issue Type: Bug

Re: Use of Avro

2011-05-29 Thread Dhruv Kumar
Good discussion. While I'd like to play around with Avro some other time as it is very interesting, I'll stick with Writables for this project because everything else in Mahout uses them. Here are some benchmarking results for different serialization frameworks, including Avro: https://github.com/

Re: Use of Avro

2011-05-29 Thread Sean Owen
Versus... Writable? no, the receiver has to know the Writable class in advance and therefore knows how to decode. It's not embedded in the serialization. Writable is nothing if not compact -- if you write a good encoder that is. On Sun, May 29, 2011 at 12:42 PM, Grant Ingersoll wrote: > > > - data

Re: Use of Avro

2011-05-29 Thread Grant Ingersoll
On May 28, 2011, at 8:43 PM, Ted Dunning wrote: > Avro is NOT JSON-based. It is one of the most efficient binary encodings > around. > > Avro uses JSON as a concrete syntax for the schema and it supports a JSON > based alternative serialization format, but the primary format is a very > well do

[jira] [Commented] (MAHOUT-612) Simplify configuring and running Mahout MapReduce jobs from Java using Java bean configuration

2011-05-29 Thread Frank Scholten (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040786#comment-13040786 ] Frank Scholten commented on MAHOUT-612: --- Still at KMeans and Canopy. After Berlin Bu

NullPointerException after getBest() during training a AdaptiveLogisticRegression.

2011-05-29 Thread XiaoboGu
Hi, The main process for MAHOUT-696 is as following, but it will always cause a NullPointerException after the first call to getBest, can we continue training AdaptiveLogisticRegressions after using getBest() to score some new lines just as TrainLogistic does? double logPEstimate = 0

[jira] [Updated] (MAHOUT-695) Option to determine number of words for LDADriver from a specified dictionary

2011-05-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-695: - Fix Version/s: 0.6 Assignee: Jake Mannix > Option to determine number of words for LDADriver fro

[jira] [Updated] (MAHOUT-612) Simplify configuring and running Mahout MapReduce jobs from Java using Java bean configuration

2011-05-29 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated MAHOUT-612: - Affects Version/s: 0.5 Assignee: Sean Owen Frank, how far along are you here? It would be gr

Re: Use of Avro

2011-05-29 Thread Sean Owen
I see, sounds good. I read up on it more and indeed there is a binary encoding. There is still the mild complexity of declaring a schema. Personally I wouldn't mind at all if someone replaced every use of Writable (or reimplemented the Writables) with Avro if it gave a clear advantage in speed or