Glad to hear that there could be a Chicago meetup. I doubt I will be there
at the right time, but it is too cool to have enough interest in more than
one city.
I definitely will not have time next week in the bay area. I am lucky
enough to have seen Grant recently elsewhere.
On Fri, Sep 16, 201
Indeed.
I strongly prefer the other two for expressivity.
On Fri, Sep 16, 2011 at 4:37 PM, Jake Mannix wrote:
> On Fri, Sep 16, 2011 at 3:30 PM, Ted Dunning
> wrote:
>
> > I think that Avro and protobufs are the current best options for large
> data
> > assets like this.
> >
>
> (or serialized
On Fri, Sep 16, 2011 at 3:36 PM, Ted Dunning wrote:
> Returning something halves performance or worse since you can't fire and
> forget. IN Pregel style, you should expect the message to be processed in
> the next super step and a value returned in the super step after that.
>
I guess it depend
On Fri, Sep 16, 2011 at 3:30 PM, Ted Dunning wrote:
> I think that Avro and protobufs are the current best options for large data
> assets like this.
>
(or serialized Thrift)
Returning something halves performance or worse since you can't fire and
forget. IN Pregel style, you should expect the message to be processed in
the next super step and a value returned in the super step after that.
On Fri, Sep 16, 2011 at 2:31 PM, Jake Mannix wrote:
> On Fri, Sep 16, 2011 at
I think that Avro and protobufs are the current best options for large data
assets like this.
On Fri, Sep 16, 2011 at 2:44 PM, Jake Mannix wrote:
> Can I vote for whichever one isn't based on XML? :)
>
> I really can't imagine encoding a 10-billion node graph in XML. Or rather,
> I can, and I'm
What's your displayer? And what formats does it use?
On Fri, Sep 16, 2011 at 2:29 PM, Grant Ingersoll wrote:
> Yeah, I hear you. I've actually just modeled it like our VectorWriter and
> it will be pluggable. I'm likely just going to do CSV and GML to start (the
> latter being XML) Maybe we ne
Yeah, I hear you. I've actually just modeled it like our VectorWriter and it
will be pluggable. I'm likely just going to do CSV and GML to start (the
latter being XML) Maybe we need YAGF (yet another graph format)?
I used to do a lot of NLP processing and output XML and always felt like what
I have used XML to represent very large graphs (billions of nodes).
It is as bad as you would imagine.
On Fri, Sep 16, 2011 at 1:44 PM, Jake Mannix wrote:
> Can I vote for whichever one isn't based on XML? :)
>
> I really can't imagine encoding a 10-billion node graph in XML. Or rather,
> I can,
[
https://issues.apache.org/jira/browse/MAHOUT-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106811#comment-13106811
]
Hudson commented on MAHOUT-811:
---
Integrated in Mahout-Quality #1043 (See
[https://builds.ap
Can I vote for whichever one isn't based on XML? :)
I really can't imagine encoding a 10-billion node graph in XML. Or rather,
I can, and I'm skeered.
On Fri, Sep 16, 2011 at 1:02 PM, Grant Ingersoll wrote:
> I'm going to write a converter to dump out clusters and their points to a
> graph
On Fri, Sep 16, 2011 at 1:24 PM, Ted Dunning wrote:
> Well, distributed memory to me would have fetch and store operations. Here
> we can send a message, but we can't actually fetch or store data without
> cooperation.
>
Funny you mention that - I've been considering suggesting that Giraph modi
Well, distributed memory to me would have fetch and store operations. Here
we can send a message, but we can't actually fetch or store data without
cooperation.
On Fri, Sep 16, 2011 at 4:45 AM, Grant Ingersoll wrote:
>
> On Sep 16, 2011, at 12:27 AM, Ted Dunning wrote:
>
> > Actually, I don't th
I'm going to write a converter to dump out clusters and their points to a graph
structure so they can be displayed.
Gephi (and others) supports a myriad of formats:
http://gephi.org/users/supported-graph-formats/
* GEXF
* GDF
* GML
* GraphML
* Pajek NET
* GraphViz DOT
* CSV
* UCINET DL
[
https://issues.apache.org/jira/browse/MAHOUT-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106661#comment-13106661
]
Hudson commented on MAHOUT-811:
---
Integrated in Mahout-Quality #1042 (See
[https://builds.ap
[
https://issues.apache.org/jira/browse/MAHOUT-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106637#comment-13106637
]
Andrew Bayer commented on MAHOUT-811:
-
Yeah, I kept the rm -rf for consistency, but ch
[
https://issues.apache.org/jira/browse/MAHOUT-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106617#comment-13106617
]
Drew Farris commented on MAHOUT-811:
{quote}
Should be easy enough to do this without
[
https://issues.apache.org/jira/browse/MAHOUT-811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13106535#comment-13106535
]
Sean Owen commented on MAHOUT-811:
--
Should be easy enough to do this without any cd-ing a
+1 for Chicago.
October 28th?
/Alan
--
***
M.Sc.(Eng.) Alan Said
Competence Center Information Retrieval & Machine Learning Technische
Universität Berlin DAI-Labor Sekr. TEL 14 Ernst-Reuter-Platz 7
10587 Berlin / Germany
Phone: 0049 - 30 - 314 74072
Fax:0
How about one at Recsys in Chicago in October (recsys.acm.org) there are
definitely other researchers using mahout, some industry folks will be there
to. I'll be attending the conference.
On Fri, Sep 16, 2011 at 4:04 PM, Grant Ingersoll wrote:
> We do from time to time, but they are usually ad h
We do from time to time, but they are usually ad hoc at this point (usually
when I am in town, which happens to be next week, although I don't think I have
time to get together)
On Sep 16, 2011, at 10:30 AM, Dhruv Kumar wrote:
> Are there any events regularly scheduled in/near San Francisco for
Are there any events regularly scheduled in/near San Francisco for the users
and devs of Mahout?
I will be moving there next week and was curious to know about the
networking opportunities with similar minded folks in the coming months.
[
https://issues.apache.org/jira/browse/MAHOUT-811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Drew Farris reopened MAHOUT-811:
Assignee: Drew Farris (was: Sean Owen)
This patch introduces another problem, specifically with
You need to specify the Lucene analyzer that will be used to tokenize the text.
That being said, I thought there was a default. What version of Mahout are
you using?
On Sep 16, 2011, at 5:41 AM, Jack He wrote:
> I've tried commad below:
> mahout seqdirectory -i cluster/testdata -o cluster-se
On Sep 16, 2011, at 12:27 AM, Ted Dunning wrote:
> Actually, I don't think that these really provide a distributed memory
> layer.
>
> What they is multiple iterations without having to renegotiate JVM launches,
> local memory that persists across iterations and decent message passing.
> (and of
I've tried commad below:
mahout seqdirectory -i cluster/testdata -o cluster-seq -c UTF-8
the input file just like:
1 2 3 4 5
6 7 8 9 10
11 12 ...etc
then, I've got a file named chunk-0 in the directory cluster-seq.it's almost
the same with input file.
the next step, I ran the commad below:
mahout
26 matches
Mail list logo