On 30.05.2011 Josh Patterson wrote:
> Likewise, we need to do it again around Hadoop Summit time.
Any interest in having something similar around Berlin Buzzwords next week in
Berlin?
Isabel
signature.asc
Description: This is a digitally signed message part.
[
https://issues.apache.org/jira/browse/MAHOUT-709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13041017#comment-13041017
]
jinyongbo commented on MAHOUT-709:
--
Thanks all.
> FP-Growth Redundant patterns
> ---
[
https://issues.apache.org/jira/browse/MAHOUT-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mat Kelcey updated MAHOUT-695:
--
Attachment: mahout-695.patch
Have removed NUM_WORDS option completely which will break existing callers
[
https://issues.apache.org/jira/browse/MAHOUT-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mat Kelcey updated MAHOUT-695:
--
Description:
It bugged me that you needed to specify the number of words directly to the
LDADriver
eg
Do you have a test case that demonstrates this?
On Sun, May 29, 2011 at 6:53 PM, Xiaobo Gu wrote:
> There is a internal buffer in AdaptiveLogisticRegression, the
> NullPointerException is caused when the backend crossfloderlearners
> starting training the examples.
>
> The default size of the bu
Josh neglects to mention that we talked a bit about his time series work
before Cloudera.
On Sun, May 29, 2011 at 8:51 PM, Josh Patterson wrote:
> we talked about
>
> - MapR's inclusion of Mahout in their distro
> - Time series and data mining, Keogh's work, his work with SAX
> - Grant prodded e
we talked about
- MapR's inclusion of Mahout in their distro
- Time series and data mining, Keogh's work, his work with SAX
- Grant prodded everyone to contribute more code
- Ted educated us on a number of topics, I lost count =)
- ideas around MR2 and Mahout, the need for a workflow that uses bot
Likewise, we need to do it again around Hadoop Summit time.
On Sun, May 29, 2011 at 1:32 PM, Dawid Weiss wrote:
> Belated a bit, but i just wanted to say thanks to those that took part in
> the meeting. I really enjoyed all the topics covered (mahout related and
> otherwise). It was a pleasure.
[
https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040978#comment-13040978
]
Lance Norskog commented on MAHOUT-676:
--
A Poisson join sampler should probably be nex
[
https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040977#comment-13040977
]
Lance Norskog edited comment on MAHOUT-676 at 5/30/11 2:43 AM:
-
[
https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lance Norskog updated MAHOUT-676:
-
Description:
This is a modular suite of samplers. It supplies the ability to throw away
samples
[
https://issues.apache.org/jira/browse/MAHOUT-676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040977#comment-13040977
]
Lance Norskog commented on MAHOUT-676:
--
bq. Normally slice samplers are used in the s
There is a internal buffer in AdaptiveLogisticRegression, the
NullPointerException is caused when the backend crossfloderlearners
starting training the examples.
The default size of the buffer is 500, and the exception is caused
when I put the 501'th example to ALR.
On Mon, May 30, 2011 at 2:27
[
https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040873#comment-13040873
]
Ted Dunning commented on MAHOUT-668:
So sorry... I didn't look enough at context.
I
[
https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040864#comment-13040864
]
Daniel McEnnis commented on MAHOUT-668:
---
Ted,
On the contrary. The only method of
Belated a bit, but i just wanted to say thanks to those that took part in
the meeting. I really enjoyed all the topics covered (mahout related and
otherwise). It was a pleasure.
Dawid
This usually means that you have fed the ALR enough data for it to push a
batch of learning into the evolutionary algorithm. That means that there
isn't any best result yet.
Getting that null doesn't impact the model, but you have to watch out for
it.
On Sun, May 29, 2011 at 1:23 AM, XiaoboGu w
More compact than what?
Avro is about as dense as either protobufs or Thrift. I use all three in
different settings (Ken K sends me data in Avro which I like because I can
poke around in the data using python, MapR uses protobufs all over the place
internally, I have written numerous services usi
That last sentence is the key.
How many of us have actually written a good encoder? For instance, our
sparse vectors don't use Golomb-delta encoding of indexes. They don't have
a special case for binary data. They don't check the stats to see if
zig-zag encoding of integers would help with the
That is much less than the effort of writing the methods required by
Writable. And the result is much larger since it gives you data readable
from many languages.
On Sun, May 29, 2011 at 12:58 AM, Sean Owen wrote:
> There is still the mild complexity of declaring a schema.
>
[
https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040839#comment-13040839
]
Ted Dunning commented on MAHOUT-668:
Daniel,
What exactly do you mean by model updat
[
https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel McEnnis updated MAHOUT-668:
--
Attachment: Mahout-668-3.patch
I've implemented a parallel training option. I've held off on t
[
https://issues.apache.org/jira/browse/MAHOUT-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel McEnnis updated MAHOUT-668:
--
Attachment: Mahout-668-3.patch
I created the patch from the wrong tree :-(.
> Adding knn suppo
[
https://issues.apache.org/jira/browse/MAHOUT-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank Scholten updated MAHOUT-714:
--
Status: Patch Available (was: Open)
Removed private constructor in order to use constructor fr
[
https://issues.apache.org/jira/browse/MAHOUT-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frank Scholten updated MAHOUT-714:
--
Attachment: MAHOUT-714.patch
> CollocDriver not runnable with ToolRunner due to private Constru
CollocDriver not runnable with ToolRunner due to private Constructor
Key: MAHOUT-714
URL: https://issues.apache.org/jira/browse/MAHOUT-714
Project: Mahout
Issue Type: Bug
Good discussion. While I'd like to play around with Avro some other time as
it is very interesting, I'll stick with Writables for this project because
everything else in Mahout uses them.
Here are some benchmarking results for different serialization frameworks,
including Avro:
https://github.com/
Versus... Writable? no, the receiver has to know the Writable class in
advance and therefore knows how to decode. It's not embedded in the
serialization. Writable is nothing if not compact -- if you write a good
encoder that is.
On Sun, May 29, 2011 at 12:42 PM, Grant Ingersoll wrote:
>
> > - data
On May 28, 2011, at 8:43 PM, Ted Dunning wrote:
> Avro is NOT JSON-based. It is one of the most efficient binary encodings
> around.
>
> Avro uses JSON as a concrete syntax for the schema and it supports a JSON
> based alternative serialization format, but the primary format is a very
> well do
[
https://issues.apache.org/jira/browse/MAHOUT-612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13040786#comment-13040786
]
Frank Scholten commented on MAHOUT-612:
---
Still at KMeans and Canopy. After Berlin Bu
Hi,
The main process for MAHOUT-696 is as following, but it will always
cause a NullPointerException after the first call to getBest, can we continue
training AdaptiveLogisticRegressions after using getBest() to score some new
lines just as TrainLogistic does?
double logPEstimate = 0
[
https://issues.apache.org/jira/browse/MAHOUT-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated MAHOUT-695:
-
Fix Version/s: 0.6
Assignee: Jake Mannix
> Option to determine number of words for LDADriver fro
[
https://issues.apache.org/jira/browse/MAHOUT-612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sean Owen updated MAHOUT-612:
-
Affects Version/s: 0.5
Assignee: Sean Owen
Frank, how far along are you here? It would be gr
I see, sounds good. I read up on it more and indeed there is a binary
encoding.
There is still the mild complexity of declaring a schema.
Personally I wouldn't mind at all if someone replaced every use of Writable
(or reimplemented the Writables) with Avro if it gave a clear advantage in
speed or
34 matches
Mail list logo