date:20110506

Random forest questions

2011-05-06 Thread Yang Zhang

I've been playing around with the RF implementation and I had a couple questions: - Does this RF implementation support weighted examples? (If so how do I specify weights?) - How do I get the RF score (confidence, probability, etc.) of a prediction? Thanks!

Re: Is any more detailed documentation aout the sgd logistic regression example.

2011-05-06 Thread Xiaobo Gu

On Thu, May 5, 2011 at 11:21 PM, Ted Dunning wrote: > On Thu, May 5, 2011 at 7:48 AM, Xiaobo Gu wrote: > >> On Thu, May 5, 2011 at 10:40 PM, Stanley Xu wrote: >> > 1. You could use the command line to add shape as category features, it >> will >> > hash categoryname=value as the feature and set

Re: Which maven command to use to put all the binaries into the distribution layout?

2011-05-06 Thread Xiaobo Gu

On Fri, May 6, 2011 at 11:34 PM, Sean Owen wrote: > I think you'd have to set up release keys and all that to make the package. > Does "mvn release:prepare" (without -Prelease) do what you want or am > I crazy? That's ultimately what makes the artifacts. Here's our > process: https://cwiki.apache.

Re: Vectorizing arbitrary value types with seq2sparse

2011-05-06 Thread Ted Dunning

Yeah.. that doesn't work at all. You need different analyzers at least and some fields are numeric, some textual. The same words in different fields (usually) need to be considered separately. N-grams raises all kinds of crazy issues. For instance, what does an n-gram of tags mean? Are tags ev

implicit data relative ratings

2011-05-06 Thread Ted Dunning

Here is an interesting paper that claims that implicit rankings based on logging requests for directions are at least as good as explicit ratings and ten times more available. http://www.vldb.org/pvldb/vol4/p290-venetis.pdf My bias in favor of implicit ratings just got stronger.

Re: Vectorizing arbitrary value types with seq2sparse

2011-05-06 Thread Frank Scholten

Hmm, seems more complex that I thought. I thought of a simple approach where you could configure your own class that concatenated the desired fields into one Text value and have the SequenceFileTokenizerMapper process that value. But this can give unexpected results? I guess it may find incorrect

Re: Vectorizing arbitrary value types with seq2sparse

2011-05-06 Thread Ted Dunning

This is definitely desirable but is very different from the current tool. My guess is the big difficulty will be describing the vectorization to be done. The hashed representations would make that easier, but still not trivial. Dictionary based methods add multiple dictionary specifications and

Vectorizing arbitrary value types with seq2sparse

2011-05-06 Thread Frank Scholten

Hi everyone, At the moment seq2sparse can generate vectors from sequence values of type Text. More specifically, SequenceFileTokenizerMapper handles Text values. Would it be useful if seq2sparse could be configured to vectorize value types such as a Blog article with several textual fields like t

Re: Which maven command to use to put all the binaries into the distribution layout?

2011-05-06 Thread Ted Dunning

Which is glued to the package life cycle in Mahout. On Fri, May 6, 2011 at 9:42 AM, Patrick Angeles wrote: > You probably want the maven assembly plugin. > > On Fri, May 6, 2011 at 12:07 PM, Ted Dunning > wrote: > > > Isn't there a mvn package target that is better for this? > > > > On Fri, May

Re: Which maven command to use to put all the binaries into the distribution layout?

2011-05-06 Thread Patrick Angeles

You probably want the maven assembly plugin. On Fri, May 6, 2011 at 12:07 PM, Ted Dunning wrote: > Isn't there a mvn package target that is better for this? > > On Fri, May 6, 2011 at 8:34 AM, Sean Owen wrote: > > > I think you'd have to set up release keys and all that to make the > package. >

Re: Which maven command to use to put all the binaries into the distribution layout?

2011-05-06 Thread Ted Dunning

Isn't there a mvn package target that is better for this? On Fri, May 6, 2011 at 8:34 AM, Sean Owen wrote: > I think you'd have to set up release keys and all that to make the package. > Does "mvn release:prepare" (without -Prelease) do what you want or am > I crazy? That's ultimately what makes

Re: Which maven command to use to put all the binaries into the distribution layout?

2011-05-06 Thread Sean Owen

I think you'd have to set up release keys and all that to make the package. Does "mvn release:prepare" (without -Prelease) do what you want or am I crazy? That's ultimately what makes the artifacts. Here's our process: https://cwiki.apache.org/confluence/display/MAHOUT/How+To+Release What are you

Re: Which maven command to use to put all the binaries into the distribution layout?

2011-05-06 Thread Xiaobo Gu

mvn -Prelease prompt me to enter a password for :GPG Passphrase: And I can't provide one. How can I build and package the release zip file without running the unit tests? Another question, why the mvn download a lot of files from the Internet while building? Regards, On Mon, Apr 11, 2011 at

Re: MapReduce Stats calculations

2011-05-06 Thread Ted Dunning

yeah... un-re-used re-usable primitives are of little help, but a Mahout big data equivalent of the R summary function would handy to have. The fact is, we already have the re-usable bits anyway. It is common to want column-wise summaries of big matrices. Useful summaries include: a) moment bas

Re: Transposing a matrix is limited by how large a node is.

2011-05-06 Thread Ted Dunning

If you have the code and would like to contribute it, file a JIRA and attach a patch. It will be interesting to hear how the SVD proceeds. Such a large dense matrix is an unusual target for SVD. Also, it is possible to adapt the R version of random projection to never keep all of the large matri

Re: Transposing a matrix is limited by how large a node is.

2011-05-06 Thread Vincent Xue

Hi Jake, As requested the stats from the job are listed below: Counter Map Reduce Total Job Counters Launched reduce tasks 0 0 2 Rack-local map tasks 0 0 69 Launched map tasks 0 0 194 Data-local map tasks 0 0 125 FileSystemCounters FILE_BYTES_READ 66,655,795,630 0 66,655,795,630 HDFS_BYTES_READ 12

Re: MapReduce Stats calculations

2011-05-06 Thread Sean Owen

Hadoop has something like this: http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/aggregate/package-summary.html I find there's a very strong and unfortunate tension between reusability and performance in some cases. Having a discrete stage to compute something li

Re: Transposing a matrix is limited by how large a node is.

2011-05-06 Thread Jake Mannix

On Fri, May 6, 2011 at 6:01 AM, Vincent Xue wrote: > Dear Mahout Users, > > I am using Mahout-0.5-SNAPSHOT to transpose a dense matrix of 55000 x > 31000. > My matrix is in stored on the HDFS as a > SequenceFile, consuming just about 13 GB. When > I > run the transpose function on my matrix, the

MapReduce Stats calculations

2011-05-06 Thread Grant Ingersoll

MAHOUT-688 has a M/R job to calculate std. deviation for document frequencies so that it can prune noisy words. I'm thinking of making it a bit more generic and adding a stats package to org.apache.mahout.math.hadoop that contains this and other basic stats calculations (mean, variance, sum of

Transposing a matrix is limited by how large a node is.

2011-05-06 Thread Vincent Xue

Dear Mahout Users, I am using Mahout-0.5-SNAPSHOT to transpose a dense matrix of 55000 x 31000. My matrix is in stored on the HDFS as a SequenceFile, consuming just about 13 GB. When I run the transpose function on my matrix, the function falls over during the reduce phase. With closer inspection,

Random forest questions

Re: Is any more detailed documentation aout the sgd logistic regression example.

Re: Which maven command to use to put all the binaries into the distribution layout?

Re: Vectorizing arbitrary value types with seq2sparse

implicit data relative ratings

Re: Vectorizing arbitrary value types with seq2sparse

Re: Vectorizing arbitrary value types with seq2sparse

Vectorizing arbitrary value types with seq2sparse

Re: Which maven command to use to put all the binaries into the distribution layout?

Re: Which maven command to use to put all the binaries into the distribution layout?

Re: Which maven command to use to put all the binaries into the distribution layout?

Re: Which maven command to use to put all the binaries into the distribution layout?

Re: Which maven command to use to put all the binaries into the distribution layout?

Re: MapReduce Stats calculations

Re: Transposing a matrix is limited by how large a node is.

Re: Transposing a matrix is limited by how large a node is.

Re: MapReduce Stats calculations

Re: Transposing a matrix is limited by how large a node is.

MapReduce Stats calculations

Transposing a matrix is limited by how large a node is.

20 matches

Site Navigation

Mail list logo

Footer information