Embedding mahout in a java app

2011-11-02 Thread Tharindu Mathew
Hi, Is there an API that is available to easily embed Mahout in a java app, feed data and get output? PS: Forgive me if this is a noob question. Still trying to figure out Mahout. -- Regards, Tharindu blog: http://mackiemathew.com/

Re: Embedding mahout in a java app

2011-11-02 Thread Sean Owen
Mahout is written in Java, so 'yes' you can put it in any Java program trivially. Why would it have anything to do with an API? I think you need to be clearer about what you are doing, and probably first have a basic look at the project. On Nov 2, 2011 8:49 AM, Tharindu Mathew mcclou...@gmail.com

does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Jake Mannix
Doesn't look like they're used anywhere but tests. In the spirit of removing clutter, I suggest we rip that stuff out! It's really not that unreasonable to carry around BiMapString, Integer dictionary to translate between feature labels and featureIds outside a Matrix rather than set it on the

Re: Embedding mahout in a java app

2011-11-02 Thread Tharindu Mathew
Hi Sean, I guess with a proper API it just makes it easier. I was hoping you'd point me to a code sample or a tutorial. I only could find everything referring to quick starts which tell how to run a sample, such as

Re: Embedding mahout in a java app

2011-11-02 Thread Sean Owen
The wiki has examples of calling most of the code via Java, and javadoc ought to cover the rest. What are you looking for specifically? Mahout is not one thing. All of it is callable from Java. On Nov 2, 2011 9:21 AM, Tharindu Mathew mcclou...@gmail.com wrote: Hi Sean, I guess with a proper

Re: Embedding mahout in a java app

2011-11-02 Thread JAGANADH G
On Wed, Nov 2, 2011 at 2:51 PM, Tharindu Mathew mcclou...@gmail.com wrote: Hi Sean, I guess with a proper API it just makes it easier. I was hoping you'd point me to a code sample or a tutorial. Hi For detailed code samples and tutorials see the book Mahout in Action. You will get a

Re: Embedding mahout in a java app

2011-11-02 Thread Tharindu Mathew
I want to create a java UI tool (based on a web app) that can pick and apply different algorithms available in Mahout to different data sets. Hence the embedding with java. Obviously, I understand that everything is callable from Java since it's written in Java :). For example, I want to do a

Re: Embedding mahout in a java app

2011-11-02 Thread Sean Owen
I see, the Java interfaces vary from area to area since different algos are different things and sometimes take different input. Generally, the classifiers take in Mahout Vector input, and are Hadoop-based, so you'd be writing some code to run Mahout jobs on Hadoop from your GUI app. Not all are

Re: Embedding mahout in a java app

2011-11-02 Thread Tharindu Mathew
Thanks Sean. Looks like I'll have to dig into the code will start from MahoutDriver. Is there a mode that will work for all algorithms. For example, all algorithms can run on a single node mode or all algorithms run on a hadoop mode ( I know Hadoop has a local mode, but that's not what I'm

Re: Embedding mahout in a java app

2011-11-02 Thread Sean Owen
MahoutDriver is the closest thing to a single point of entry for all the algorithms. It's for command line use but you can see what it does after parsing args. In general, most algorithms use Hadoop, so in general no there is not a Hadoop free mode. Some bits have non Hadoop parts though that's

Minhash key groups

2011-11-02 Thread Grant Ingersoll
What's the Minhash key groups value used for in the MinhashDriver? I mean, I see it is used for building up the key out of the hashed values, but what's the significance of different values for it? The default is 2, what does it mean practically speaking if I choose, say, 10? AFAICT, it

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Grant Ingersoll
What functionality, specifically, are you proposing to remove? I know we had a lot of discussion around some of this stuff way back when as to how best to do it, but of course, that doesn't mean it has uptake. If it's on the Matrix, then doesn't it more easily get shipped around via the

Re: Embedding mahout in a java app

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 7:17 AM, Tharindu Mathew wrote: I want to create a java UI tool (based on a web app) that can pick and apply different algorithms available in Mahout to different data sets. Very cool! Keep us posted, as this would be immensely useful! Any chance it will be donated back?

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Jake Mannix
On Wed, Nov 2, 2011 at 7:34 AM, Grant Ingersoll gsing...@apache.org wrote: What functionality, specifically, are you proposing to remove? I'm suggesting we kill, from Matrix.java and descendents, all of the following methods: MapString, Integer getColumnLabelBindings(); MapString, Integer

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Grant Ingersoll
On Nov 2, 2011, at 10:58 AM, Jake Mannix wrote: On Wed, Nov 2, 2011 at 7:34 AM, Grant Ingersoll gsing...@apache.org wrote: What functionality, specifically, are you proposing to remove? I'm suggesting we kill, from Matrix.java and descendents, all of the following methods:

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Jake Mannix
Ah, ok, I was looking at an older source tree. Then in that case, no *release* we've had touches them, and nowhere in the codebase does anyone currently use the bindings, even if it is the case that if you *did* use them, they would indeed get serialized with the matrix. Which is why I was

Re: We need help about how to install mahout

2011-11-02 Thread Patrick Hunt
On Tue, Nov 1, 2011 at 8:47 PM, Grant Ingersoll gsing...@apache.org wrote: On Nov 1, 2011, at 2:16 PM, Patrick Hunt wrote: On Tue, Nov 1, 2011 at 10:44 AM, Ted Dunning ted.dunn...@gmail.com wrote: On Tue, Nov 1, 2011 at 9:18 AM, Patrick Hunt ph...@apache.org wrote: 2011/10/31 Ted Dunning

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Jake Mannix
On Wed, Nov 2, 2011 at 10:15 AM, Grant Ingersoll gsing...@apache.orgwrote: On Nov 2, 2011, at 11:50 AM, Jake Mannix wrote: Ah, ok, I was looking at an older source tree. Then in that case, no *release* we've had touches them, and nowhere in the codebase does anyone currently use the

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Ted Dunning
These labels are here by analogy with R data.frames where having the labels inside the data is really handy. On Wed, Nov 2, 2011 at 10:15 AM, Grant Ingersoll gsing...@apache.orgwrote: HDFS all nice and safe, and I've got a pile of numeric serialized (DistributedRow-)Matrix instances which

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Ted Dunning
It seems like a good idea, but it definitely is not impossible to work around the lack. Having the labels should make certain forms of cluster dumping easier, but for all the stuff I do with hashed representations, the hashing destroys any utility of labels. It may be that label utility is

Fwd: Mahout In Action - Bayes/CBayes Classification returns NaN

2011-11-02 Thread Ted Dunning
Forwarded to mahout list instead of lucene. Let's move the discussion there. -- Forwarded message -- From: Sam Cunningham sam_cun...@yahoo.com Date: Wed, Nov 2, 2011 at 10:33 AM Subject: Mahout In Action - Bayes/CBayes Classification returns NaN To: gene...@lucene.apache.org My

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Jake Mannix
On Wed, Nov 2, 2011 at 11:22 AM, Ted Dunning ted.dunn...@gmail.com wrote: It seems like a good idea, but it definitely is not impossible to work around the lack. And more importantly, it may be a good idea in theory, but has anyone actually used it, or foresee using it soon? It's 9 methods

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Sean Owen
The only thought I have about it is that there's a to-do to make that stuff actually used and integrate into a wrapper class. I think it's fine to kill it. If someone goes to all the trouble of re-implementing it later it will not have been extra work; it probably was to be redone anyway. On Wed,

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Ted Dunning
Let's nuke it. I am the most vocal in favor and I can't get up the enthusiasm to push for keeping it. On Wed, Nov 2, 2011 at 11:31 AM, Sean Owen sro...@gmail.com wrote: The only thought I have about it is that there's a to-do to make that stuff actually used and integrate into a wrapper

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Jake Mannix
Ok, we can always resurrect it. I'll leave this thread open until after work tonight (8 hrs or so from now), and if I don't hear any vociferous complaints or reasoned thoughts on why this is crazy, I'll chop 'em. -jake On Wed, Nov 2, 2011 at 11:34 AM, Ted Dunning ted.dunn...@gmail.com wrote:

Re: NaN - classification results (cbayes)

2011-11-02 Thread Sam Cunningham
Below I am providing with some documents regarding the issue. The top 4 documents are sample normalized classes (Entertainment, Health, SciTech, and Sports). The last document is the model. http://12.233.16.76/icons/Entertainment.zip http://12.233.16.76/icons/Health.zip

Re: NaN - classification results (cbayes)

2011-11-02 Thread Ted Dunning
Sam, I recommend actually subscribing to the mailing list while you have active questions. There is a long history of nabble postings not actually making it to the apache mailing lists. On Wed, Nov 2, 2011 at 12:19 PM, Sam Cunningham sam_cun...@yahoo.comwrote: Below I am providing with some

How To Contribute

2011-11-02 Thread Grant Ingersoll
In the vein of users become contributors become committers: It seems there has been some spark of interest in contributing more, so I thought I would pass along a few pointers: 1. https://cwiki.apache.org/MAHOUT/how-to-contribute.html -- Details how to submit patches, etc. IDE codestyles at

Re: NaN - classification results (cbayes)

2011-11-02 Thread Ted Dunning
I can't download these files. The server never responds as far as I can tell. You may have given out an local address. Or turned the machine off. Or whatever. Can you put them onto dropbox or pastebin or S3 or something so that we can look at these? On Wed, Nov 2, 2011 at 12:19 PM, Sam

RE: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Jeff Eastman
+1 from me too. IIRC this all got added when we were annotating Vectors too and there we ended up with NamedVector as a wrapper. If this Matrix annotation is not being used then let's clean it up. -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Wednesday,

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Isabel Drost
On 02.11.2011 Jake Mannix wrote: I'll leave this thread open until after work tonight (8 hrs or so from now), and if I don't hear any vociferous complaints or reasoned thoughts on why this is crazy, I'll chop 'em. +1 for the cleanup, however if you are leaving the thread open for that

Re: does anyone use the row label bindings stuff in Vector / Matrix?

2011-11-02 Thread Jake Mannix
On Wed, Nov 2, 2011 at 5:24 PM, Isabel Drost isa...@apache.org wrote: On 02.11.2011 Jake Mannix wrote: I'll leave this thread open until after work tonight (8 hrs or so from now), and if I don't hear any vociferous complaints or reasoned thoughts on why this is crazy, I'll chop 'em. +1

Re: NaN - classification results (cbayes)

2011-11-02 Thread Sam Cunningham
It seems that some of us were not able to get to the URLs. So, I am uploading the files here. http://lucene.472066.n3.nabble.com/file/n3475998/Entertainment.zip Entertainment.zip http://lucene.472066.n3.nabble.com/file/n3475998/Health.zip Health.zip

Re: Embedding mahout in a java app

2011-11-02 Thread Tharindu Mathew
Thanks everyone for the encouraging replies. If it's possible I will work on and contribute a clean API that will ease the learning curve of applying Mahout. On Wed, Nov 2, 2011 at 9:40 PM, Matteo Moci mox...@gmail.com wrote: I just found this [1] project. It seems a bit old, and I don't know