[jira] [Created] (MAHOUT-765) Upgrade Lucene to latest release or wait for LUCENE-3151

2011-07-18 Thread Grant Ingersoll (JIRA)
Reporter: Grant Ingersoll Lucene is up to 3.3, we should upgrade to that, or we should wait for LUCENE-3151 and move to that. Either way, we should upgrade Lucene. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: RowSimilarity ?'s

2011-07-18 Thread Grant Ingersoll
On Jul 18, 2011, at 4:32 PM, Sean Owen wrote: > On Mon, Jul 18, 2011 at 8:52 PM, Grant Ingersoll wrote: >> We definitely cannot do this all map side, but I think the big difference is >> the paper emits the partial sum of all the co-occurrences it's seen, whereas >>

Re: RowSimilarity ?'s

2011-07-18 Thread Grant Ingersoll
On Jul 18, 2011, at 3:52 PM, Grant Ingersoll wrote: > > I think the big win is to not construe this implementation to be based on > what is in the paper. I'm starting to think we should have two RowSimilarity > jobs. One for algebraic functions and one for those who are n

Re: RowSimilarity ?'s

2011-07-18 Thread Grant Ingersoll
in for your use case is adding that lever to trim > down big columns, which is quite easy to add. I think the big win is to not construe this implementation to be based on what is in the paper. I'm starting to think we should have two RowSimilarity jobs. One for algebraic functions and one

Re: RowSimilarity ?'s

2011-07-18 Thread Grant Ingersoll
>> wrote: >>>> Rows. >>>> >>>> On Thu, Jul 14, 2011 at 12:24 PM, Sean Owen wrote: >>>> >>>>> Just needs a rule for >>>>> tossing data -- you could simply throw away such columns (ouch), or at >>>>> least >>>>> use only a sampled subset of it. >>>>> >>>> >>> -- Grant Ingersoll

Use of JIRA was [jira] [Resolved] (MAHOUT-752) Semantic Vectors: generate and use vectors from User/Item Taste data models

2011-07-18 Thread Grant Ingersoll
e and does preference evaluation based on the >> vectors and a given DistanceMeasure >> This is a large exploration of the Semantic Vectors concept: >> [http://code.google.com/p/semanticvectors/]. And was the inspiration for >> this project. > > -- > This message is automatically generated by JIRA. > For more information on JIRA, see: http://www.atlassian.com/software/jira > > -- Grant Ingersoll

[jira] [Commented] (MAHOUT-763) Map-Side Distance Comparison

2011-07-17 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066632#comment-13066632 ] Grant Ingersoll commented on MAHOUT-763: +1 > Map-Side Distance Com

[jira] [Commented] (MAHOUT-763) Map-Side Distance Comparison

2011-07-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066401#comment-13066401 ] Grant Ingersoll commented on MAHOUT-763: The code is more or less a cop

[jira] [Commented] (MAHOUT-763) Map-Side Distance Comparison

2011-07-16 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066398#comment-13066398 ] Grant Ingersoll commented on MAHOUT-763: Helpful label. Put up a patch and

[jira] [Reopened] (MAHOUT-763) Map-Side Distance Comparison

2011-07-15 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reopened MAHOUT-763: Going to reopen to provide an alternate output form > Map-Side Distance Compari

[jira] [Resolved] (MAHOUT-763) Map-Side Distance Comparison

2011-07-15 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-763. Resolution: Fixed Fix Version/s: 0.6 Assignee: Grant Ingersoll Committed

Re: RowSimilarity ?'s

2011-07-15 Thread Grant Ingersoll
gt; Rows. >>>> >>>> On Thu, Jul 14, 2011 at 12:24 PM, Sean Owen wrote: >>>> >>>>> Just needs a rule for >>>>> tossing data -- you could simply throw away such columns (ouch), or at >>>>> least >>>>> use only a sampled subset of it. >>>>> >>>> >>> -- Grant Ingersoll

[jira] [Updated] (MAHOUT-763) Map-Side Distance Comparison

2011-07-15 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-763: --- Attachment: MAHOUT-763.patch Handles multiple seed files > Map-Side Distance Compari

[jira] [Updated] (MAHOUT-763) Map-Side Distance Comparison

2011-07-14 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-763: --- Attachment: MAHOUT-763.patch Fixed some issues w/ the job configuration > Map-Side Dista

[jira] [Updated] (MAHOUT-763) Map-Side Distance Comparison

2011-07-14 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-763: --- Attachment: MAHOUT-763.patch fix import > Map-Side Distance Compari

[jira] [Updated] (MAHOUT-763) Map-Side Distance Comparison

2011-07-14 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-763: --- Attachment: MAHOUT-763.patch First draft of a patch. Input seeds can be vector, Cluster or

Re: Build failed in Jenkins: Mahout-Quality #939

2011-07-14 Thread Grant Ingersoll
gt; location: package org.apache.mahout.clustering > /x1/jenkins/jenkins-slave/workspace/Mahout-Quality/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterMapper.java:[42,68] > cannot find symbol > symbol: class WeightedPropertyVectorWritable >extends > Mapper,VectorWritable,IntWritable,WeightedPropertyVectorWritable> > { > /x1/jenkins/jenkins-slave/workspace/Mahout-Quality/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansClusterer.java:[139,63] > cannot find symbol > symbol : class WeightedPropertyVectorWritable > location: class org.apache.mahout.clustering.kmeans.KMeansClusterer > /x1/jenkins/jenkins-slave/workspace/Mahout-Quality/trunk/core/src/main/java/org/apache/mahout/clustering/kmeans/KMeansDriver.java:[478,28] > cannot find symbol > symbol : class WeightedPropertyVectorWritable > location: class org.apache.mahout.clustering.kmeans.KMeansDriver > > [INFO] > > [INFO] For more information, run Maven with the -e switch > [INFO] > > [INFO] Total time: 57 seconds > [INFO] Finished at: Thu Jul 14 19:27:52 UTC 2011 > [INFO] Final Memory: 96M/435M > [INFO] > > [FINDBUGS] Skipping publisher since build result is FAILURE > [PMD] Skipping publisher since build result is FAILURE > [TASKS] Skipping publisher since build result is FAILURE > Archiving artifacts > Recording test results > Publishing Javadoc > Updating MAHOUT-761 > Publishing Clover coverage report... > No Clover report will be published due to a Build Failure > -- Grant Ingersoll

Re: RowSimilarity ?'s

2011-07-14 Thread Grant Ingersoll
e are several levers you can pull, including one like Ted mentions -- >> maxSimilaritiesPerRow. >> >> On Thu, Jul 14, 2011 at 6:17 PM, Grant Ingersoll >> wrote: >>> >>> Any thoughts on why not reuse our existing Distance measures? Seems like >>> on

Re: RowSimilarity ?'s

2011-07-14 Thread Grant Ingersoll
On Jul 14, 2011, at 3:24 PM, Sean Owen wrote: > On Thu, Jul 14, 2011 at 8:00 PM, Grant Ingersoll wrote: > >> >>> You need all cooccurrences since some implementations need that value, >> and >>> you're computing all-pairs. >> >> Can

Re: RowSimilarity ?'s

2011-07-14 Thread Grant Ingersoll
per for what appears to be a bigger corpus with more terms on crappier hardware. > (I'm sure you can hack away the cooccurrence > computation if you know your metric doesn't use it.) > > There are several levers you can pull, including one like Ted mentions -- > maxSi

[jira] [Created] (MAHOUT-763) Map-Side Distance Comparison

2011-07-14 Thread Grant Ingersoll (JIRA)
Map-Side Distance Comparison Key: MAHOUT-763 URL: https://issues.apache.org/jira/browse/MAHOUT-763 Project: Mahout Issue Type: New Feature Reporter: Grant Ingersoll Priority: Minor

Re: RowSimilarity ?'s

2011-07-14 Thread Grant Ingersoll
imilarityJob? Last I looked it was around 53B before it was killed. > > --sebastian > > > [1] http://www.slideshare.net/sscdotopen/mahoutcf > [2] > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.144.9712&rep=rep1&type=pdf > [3] > http://engineering.foursquare.com/2

RowSimilarity ?'s

2011-07-14 Thread Grant Ingersoll
Are there docs on RowSimilarity? Also, has anyone tried it at scale? I'm seeing some long running times for a matrix that I don't think is huge (still waiting to hear from colleague about actual size) What does the distributed vector similarity get us over just using our existing distance mea

[jira] [Updated] (MAHOUT-761) Emitting cluster points should have the option of emitting the distance and potentially other related metrics

2011-07-14 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-761: --- Attachment: MAHOUT-761.patch and better formatting > Emitting cluster points should h

[jira] [Updated] (MAHOUT-761) Emitting cluster points should have the option of emitting the distance and potentially other related metrics

2011-07-14 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-761: --- Attachment: MAHOUT-761.patch Hooks it into ClusterDumper > Emitting cluster points sho

[jira] [Updated] (MAHOUT-761) Emitting cluster points should have the option of emitting the distance and potentially other related metrics

2011-07-14 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-761: --- Attachment: MAHOUT-761.patch Tests pass. > Emitting cluster points should have the opt

Cluster seeds was Re: Emitting distance from centroid for K-Means

2011-07-14 Thread Grant Ingersoll
On Jul 13, 2011, at 6:42 PM, Jeff Eastman wrote: > > The assumption about input seeds originally came from using Canopy to prime > KMeans but it has become the prior set of clusters since the algorithms have > converged on common formats & models. Each iteration reads in the set of > clusters-

Re: [jira] [Commented] (MAHOUT-760) "org.apache.mahout.fpm.pfpgrowth.PFPGrowthTest" test fails during install

2011-07-14 Thread Grant Ingersoll
>>> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:42) >>> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:184) >>> at org.junit.runners.ParentRunner.run(ParentRunner.java:236) >>> at >>> org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) >>> at >>> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:119) >>> at >>> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:101) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>> at java.lang.reflect.Method.invoke(Method.java:611) >>> at >>> org.apache.maven.surefire.booter.ProviderFactory$ClassLoaderProxy.invoke(ProviderFactory.java:103) >>> at $Proxy0.invoke(Unknown Source) >>> at >>> org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:150) >>> at >>> org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcess(SurefireStarter.java:91) >>> at >>> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:69) >>> Every other test in all the components succeed. >> >> -- >> This message is automatically generated by JIRA. >> For more information on JIRA, see: http://www.atlassian.com/software/jira >> >> >> > > > > -- > Lance Norskog > goks...@gmail.com -- Grant Ingersoll

Re: Emitting distance from centroid for K-Means

2011-07-13 Thread Grant Ingersoll
ds on this for unification with classification > interfaces. > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Wednesday, July 13, 2011 3:08 PM > To: dev@mahout.apache.org > Subject: Re: Emitting distance from centroid for K-Means

Re: Emitting distance from centroid for K-Means

2011-07-13 Thread Grant Ingersoll
o another file > or a different version could output the distance directly instead of the pdf. > I don't know what that would mean for Dirichlet; however, since it only plays > with pdf values. > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.o

[jira] [Updated] (MAHOUT-761) Emitting cluster points should have the option of emitting the distance and potentially other related metrics

2011-07-13 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-761: --- Attachment: MAHOUT-761.patch Rough draft of a patch, haven't really tested yet, but ge

Re: Emitting distance from centroid for K-Means

2011-07-13 Thread Grant Ingersoll
be useful. It calculates a set of representative > points for each cluster and calculates interCluster and intraCluster > densities from that. > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Wednesday, July 13, 2011 1:28 PM

[jira] [Created] (MAHOUT-761) Emitting cluster points should have the option of emitting the distance and potentially other related metrics

2011-07-13 Thread Grant Ingersoll (JIRA)
://issues.apache.org/jira/browse/MAHOUT-761 Project: Mahout Issue Type: Improvement Reporter: Grant Ingersoll Priority: Minor See http://www.lucidimagination.com/search/document/c5502e401f59f799/emitting_distance_from_centroid_for_k_means -- This message is

Re: Emitting distance from centroid for K-Means

2011-07-13 Thread Grant Ingersoll
e methods that could be useful. It calculates a set of representative > points for each cluster and calculates interCluster and intraCluster > densities from that. > > -Original Message----- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Wednesday,

Re: Emitting distance from centroid for K-Means

2011-07-13 Thread Grant Ingersoll
> >> The weight is the probability the vector is a member of the cluster. For >> FuzzyK and Dirichlet it is fractional, for KMeans it is 1 as the algorithm >> is maximum likelihood and each point is only assigned to a single cluster. >> >> -Origina

Re: Updates and status for board report

2011-07-13 Thread Grant Ingersoll
it'll be a pretty bare overview from me. >>>>>> >>>>>> On Fri, Jul 1, 2011 at 8:52 AM, Sean Owen wrote: >>>>>> >>>>>> Hi all, we're going to submit a board report this month. Please send >>>>>>> me a quick blurb from each of you with news, updates, talks, events, >>>>>>> books, articles, status, trivia, jokes for inclusion in the report. It >>>>>>> should cover the last 3 months. I'll take care of the rest. >>>>>>> >>>>>>> Sean >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> > -- Grant Ingersoll

Emitting distance from centroid for K-Means

2011-07-13 Thread Grant Ingersoll
Does it make sense to output the distance to the cluster as the weight in the KMeansClusterer.outputPointWithClusterInfo method instead of 1? What's the purpose of the 1 as the weight? -Grant

[jira] [Created] (MAHOUT-757) RowIdJob does not use Mahout's standard CLI parameters

2011-07-11 Thread Grant Ingersoll (JIRA)
orter: Grant Ingersoll Priority: Minor RowIdJob doesn't use --input and --output and should for taking in it's arguments -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-627) Baum-Welch Algorithm on Map-Reduce for Parallel Hidden Markov Model Training.

2011-06-25 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054871#comment-13054871 ] Grant Ingersoll commented on MAHOUT-627: Hi Dhruv, How goes progress on

[jira] [Commented] (MAHOUT-652) [GSoC Proposal] Parallel Viterbi algorithm for HMM

2011-06-25 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054870#comment-13054870 ] Grant Ingersoll commented on MAHOUT-652: Awesome! How goes the testing? >

Re: Mahout & Solr

2011-06-19 Thread Grant Ingersoll
e projects and integrates well with Solr." >> >> How does Mahout integrate well with Solr? Can someone explain a brief >> overview on whats available. I'm guessing one of the features would be the >> replacing of the Carrot2 clustering algorithm with something a little more >> sophisticated? >> >> Thanks >> -- Grant Ingersoll

Re: Problems running examples

2011-06-11 Thread Grant Ingersoll
What do you get when you run on good ol' Hadoop, i.e the one we actually support and build and test on? On Jun 10, 2011, at 7:38 PM, Jeff Eastman wrote: > Moving to @dev > > Hi Drew, > > Don't know what is happening, but I did a clean unpack of the 0.5 distro, mvn > install and ran build-re

[jira] [Updated] (MAHOUT-458) The LDA output does not include the topic-probability distribution per document (p(z|d)). It outputs only the topics and corresponding words.

2011-06-06 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-458: --- Fix Version/s: (was: 0.6) 0.5 This was fixed on MAHOUT-682 and MAHOUT

[jira] [Commented] (MAHOUT-399) LDA on Mahout 0.3 does not converge to correct solution for overlapping pyramids toy problem.

2011-06-06 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13044806#comment-13044806 ] Grant Ingersoll commented on MAHOUT-399: Michael, any luck on the unit t

Re: Getting LICENSE and NOTICE right

2011-06-05 Thread Grant Ingersoll
Don't some of the tarballs download all the deps? One thing we've done in Lucene is try to fail early in the process if someone doesn't get the License/Notice right when adding a new dependency. It's hooked into Ant, but could likely be done in Maven or donated to RAT. On Jun 4, 2011, at 4:1

Re: [VOTE] Release Mahout 0.5, take 3

2011-06-01 Thread Grant Ingersoll
Thanks, Sean! On Jun 1, 2011, at 11:32 AM, Sean Owen wrote: > Release is propagating to the mirrors now, and I'll update the site > tomorrow. Otherwise, we're clear to resume normal work on HEAD. > > On Wed, Jun 1, 2011 at 4:15 PM, Sean Owen wrote: > >> (It's in the main release package at dis

Re: [VOTE] Release Mahout 0.5, take 3

2011-06-01 Thread Grant Ingersoll
+1. The only thing I noticed is we don't have the KEYS file in the distribution, but I don't think that is a requirement as long as our details on downloading points to where the KEYS file is. On May 28, 2011, at 11:02 AM, Sean Owen wrote: > https://repository.apache.org/content/repositorie

Re: [VOTE] Release Mahout 0.5, take 3

2011-05-31 Thread Grant Ingersoll
Looking now. On May 31, 2011, at 4:31 AM, Sean Owen wrote: > Going once, going twice... going to complete the release later this evening. > I'm guessing anyone who cares to look and check has done so already. > > On Sat, May 28, 2011 at 4:02 PM, Sean Owen wrote: > >> https://repository.apache.

Re: Use of Avro

2011-05-30 Thread Grant Ingersoll
On May 29, 2011, at 11:34 AM, Dhruv Kumar wrote: > Good discussion. While I'd like to play around with Avro some other time as > it is very interesting, I'll stick with Writables for this project because > everything else in Mahout uses them. > > Here are some benchmarking results for different

Re: Use of Avro

2011-05-29 Thread Grant Ingersoll
hanism is provided. > > On Sat, May 28, 2011 at 4:13 PM, Sean Owen wrote: > >> Avro is JSON-based and that just seems far too verbose for these purposes. >> Grant Ingersoll

Re: [VOTE] Release Mahout 0.5, take 2

2011-05-28 Thread Grant Ingersoll
Sorry for the dup, wasn't sure if it went out due to plane connection problems. On May 27, 2011, at 5:38 PM, Grant Ingersoll wrote: > -1. > > There's no LICENSE.txt or NOTICE.txt file in the distribution files > (mahout-distribution-0.5.tar.gz and I assume others in that

Re: [VOTE] Release Mahout 0.5, take 2

2011-05-27 Thread Grant Ingersoll
license, so we should think about doing that. Still need to run the tests, etc. More in a moment, my battery is about dead. -Grant On May 27, 2011, at 8:35 AM, Grant Ingersoll wrote: > Testing now while on the plane! > > On May 27, 2011, at 3:07 AM, Sean Owen wrote: > >&

Re: [VOTE] Release Mahout 0.5, take 2

2011-05-27 Thread Grant Ingersoll
license, so we should think about doing that. Still need to run the tests, etc. More in a moment, my battery is about dead. -Grant On May 27, 2011, at 8:35 AM, Grant Ingersoll wrote: > Testing now while on the plane! > > On May 27, 2011, at 3:07 AM, Sean Owen wrote: > >&

Re: [VOTE] Release Mahout 0.5, take 2

2011-05-27 Thread Grant Ingersoll
arguments which have worked in the past. I'd >> like to get Shannon's reaction before I decide if this is a show stopper or >> not. I did mark it as fix in 0.6, and I doubt we have a lot of users yet >> with the spectral clustering, so I'm on the fence. >>

Re: [VOTE] Release Mahout 0.5, take 2

2011-05-26 Thread Grant Ingersoll
I'm hoping to take a look tonight, but agree the bug isn't a show stopper. +0 as of now, hopefully a +1 by the end of the day. On May 26, 2011, at 7:27 AM, Benson Margulies wrote: > Me too. +1 > > On Thu, May 26, 2011 at 7:18 AM, Sean Owen wrote: >> (FWIW I vote +1 for the release in spite o

Re: SF Informal meetup on May 23?

2011-05-24 Thread Grant Ingersoll
be depending on my arrival time. >>>>>> >>>>>> On Fri, May 20, 2011 at 4:58 AM, Ted Dunning >>>>> wrote: >>>>>> >>>>>>> Great. As we get a count, I will make reservation. >>>>>>> >&

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-23 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037924#comment-13037924 ] Grant Ingersoll commented on MAHOUT-694: Sean/Drew: +1 on commit. Allan: sen

Re: Build failed in Jenkins: Mahout-Examples #9

2011-05-22 Thread Grant Ingersoll
For those playing along at home, I'm working on getting the examples to build and run/validate on a regular basis, so as to prevent the likes of MAHOUT-694 from happening again, or at least reducing the likelihood. -Grant On May 22, 2011, at 11:41 AM, Apache Jenkins Server wrote: > See

Re: [jira] [Commented] (MAHOUT-537) Bring DistributedRowMatrix into compliance with Hadoop 0.20.2

2011-05-22 Thread Grant Ingersoll
The release notes for 0.21 weren't exactly inspirational when it comes to adoption: "It has not undergone testing at scale and should not be considered stable or suitable for production." - -- http://hadoop.apache.org/common/releases.html -G On May 21, 2011, at 2:43 PM, Ted Dunning wrote: > S

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-22 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037638#comment-13037638 ] Grant Ingersoll commented on MAHOUT-694: Drew, +1 on commit

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-22 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037625#comment-13037625 ] Grant Ingersoll commented on MAHOUT-694: Not sure what happened, ran again a

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-22 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037624#comment-13037624 ] Grant Ingersoll commented on MAHOUT-694: bq. Work is done in a directory ca

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-22 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037622#comment-13037622 ] Grant Ingersoll commented on MAHOUT-694: But, I ran it a second time and th

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-22 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037620#comment-13037620 ] Grant Ingersoll commented on MAHOUT-694: Hmm, Drew, I don't see the Cl

[jira] [Resolved] (MAHOUT-588) Benchmark Mahout's clustering performance on EC2 and publish the results

2011-05-22 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved MAHOUT-588. Resolution: Fixed Fix Version/s: (was: 0.6) 0.5 > Benchm

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-22 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037619#comment-13037619 ] Grant Ingersoll commented on MAHOUT-694: I'm reviewing at the moment,

Re: [VOTE] Release Mahout 0.5 artifacts

2011-05-21 Thread Grant Ingersoll
Not quite done yet. Drew just posted a patch that I think takes care of everything (haven't tested myself yet) -Grant On May 20, 2011, at 10:25 PM, Sean Owen wrote: > I see a commit for MAHOUT-694. Am I right that: > > 1) That can be resolved, > 2) Any additional issues it raises may be opene

Re: Is LDA Broken?

2011-05-21 Thread Grant Ingersoll
On Sat, May 21, 2011 at 11:46 AM, Grant Ingersoll wrote: > >> I did: >> >> finalize{ >> IOUtils.closeStream(); >> } >> >> The InputStream in this particular case is actually one we opened. I'll >> commit the patch. >> >> Jeff, what's your other exception? >> >>

Re: Is LDA Broken?

2011-05-21 Thread Grant Ingersoll
I did: finalize{ IOUtils.closeStream(); } The InputStream in this particular case is actually one we opened. I'll commit the patch. Jeff, what's your other exception? On May 20, 2011, at 11:56 PM, Sean Owen wrote: > That's a decent pattern. The streams in question here are implemente

Mahout PMC Welcomes Sebastian Schelter

2011-05-21 Thread Grant Ingersoll
Hi Mahouts, The Mahout PMC is happy to welcome Sebastian Schelter to the ranks. Sebastian has been a long time committer on Mahout and we are pleased to have him join us! Congratulations! -Grant

Re: Is LDA Broken?

2011-05-20 Thread Grant Ingersoll
d my Linux VM, did a clean mahout > build, zapped bin/work, and got the same result. Will have to debug more > later today... > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Friday, May 20, 2011 12:54 PM > To: dev@mahout.apach

[jira] [Updated] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-694: --- Attachment: MAHOUT-694.patch Close the Input stream. > IndexOutOfBoundException using bu

Re: Is LDA Broken?

2011-05-20 Thread Grant Ingersoll
zapped bin/work, and got the same result. Will have to debug more > later today... > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Friday, May 20, 2011 12:54 PM > To: dev@mahout.apache.org > Subject: Re: Is LDA Broken? > > Hmm

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037054#comment-13037054 ] Grant Ingersoll commented on MAHOUT-694: bq. and 0.5 reads from hdfs, not l

Re: Classifier Interface

2011-05-20 Thread Grant Ingersoll
Perhaps we should future proof here a little bit and simply have a classify method that returns a typed object that contains the necessary info depending on the implementation? Something like: ClassifierResult classify() and then ClassifierResult has an enum or something that indicates whether

Re: Is LDA Broken?

2011-05-20 Thread Grant Ingersoll
p.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) >at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) >at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187) > C > > -Original Message- > From: Grant Ingersoll [mailto:gs

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036981#comment-13036981 ] Grant Ingersoll commented on MAHOUT-694: Here's the 0.4 code: {code} cd

Re: Is LDA Broken?

2011-05-20 Thread Grant Ingersoll
use kmeans Clustering > ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or > directory > ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or > directory > > > -Original Message- > From: Grant Ingersoll [mailto:

Re: Is LDA Broken?

2011-05-20 Thread Grant Ingersoll
t; ./examples/bin/build-reuters.sh: line 54: ./bin/mahout: No such file or > directory > ./examples/bin/build-reuters.sh: line 64: ./bin/mahout: No such file or > directory > > > -Original Message- > From: Grant Ingersoll [mailto:gsing...@apache.org] > Sent: Friday,

[jira] [Created] (MAHOUT-707) Setup Jenkins Jobs to validate our Examples/bin Scripts

2011-05-20 Thread Grant Ingersoll (JIRA)
: Grant Ingersoll Fix For: 0.6 We should setup Jenkins to run our example scripts on a regular basis (See MAHOUT-694) and check for breakage. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036965#comment-13036965 ] Grant Ingersoll commented on MAHOUT-694: bq. but perhaps build-reuters.sh

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036956#comment-13036956 ] Grant Ingersoll commented on MAHOUT-694: Drew, does it even run on a cluster

Re: Is LDA Broken?

2011-05-20 Thread Grant Ingersoll
Likely so, see MAHOUT-694. On May 20, 2011, at 1:39 PM, Sean Owen wrote: > Oh sorry these are the same issue? Great! > On May 20, 2011 5:44 PM, "Jake Mannix" wrote: >> Looks like Grant got a fix posted? Has anyone else tried it? >> >> -jake >> >> On Fri, May 20, 2011 at 9:32 AM, Sean Owen

[jira] [Updated] (MAHOUT-706) reuse lucene tokenstreams

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-706: --- Fix Version/s: 0.6 Assignee: Grant Ingersoll > reuse lucene tokenstre

Re: Is LDA Broken?

2011-05-20 Thread Grant Ingersoll
We should setup a Jenkins job to run the examples on a regular basis and to validate the output. I've been doing some Jenkins work lately, I will see if I can get to it after Revolution. -Grant On May 20, 2011, at 12:43 PM, Jake Mannix wrote: > Looks like Grant got a fix posted? Has anyone e

[jira] [Updated] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-694: --- Attachment: MAHOUT-694.patch Here's the fix. Allan, please co

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036889#comment-13036889 ] Grant Ingersoll commented on MAHOUT-694: LUCENE-929 broke this. The fix fo

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036888#comment-13036888 ] Grant Ingersoll commented on MAHOUT-694: In fact, that does the t

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036886#comment-13036886 ] Grant Ingersoll commented on MAHOUT-694: might simply be handled by dropping

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036884#comment-13036884 ] Grant Ingersoll commented on MAHOUT-694: Ah, I see the -tmp now, it's u

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036881#comment-13036881 ] Grant Ingersoll commented on MAHOUT-694: When I run it, I get the reuters

[jira] [Commented] (MAHOUT-694) IndexOutOfBoundException using build-reuters.sh

2011-05-20 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13036878#comment-13036878 ] Grant Ingersoll commented on MAHOUT-694: Hmm, the Lucene ExtractReuters code

Re: While you're waiting on the 0.5 release...

2011-05-20 Thread Grant Ingersoll
pache.org/jira/browse/MAHOUT-698> > MAHOUT-698 <https://issues.apache.org/jira/browse/MAHOUT-698> > > Hook up Automated Patch Checking for > Mahout<https://issues.apache.org/jira/browse/MAHOUT-698> > [image: Major][image: Open] Open16/May/1116/May/11 > *Actions*<https://issues.apache.org/jira/rest/api/1.0/issues/12507419/ActionsAndOperations?atl_token=fa2d705cd4e03674ec5b1e397ae4f01a353b5f0c> > <https://issues.apache.org/jira/browse/MAHOUT-700>[image: > Bug]<https://issues.apache.org/jira/browse/MAHOUT-700> > MAHOUT-700 <https://issues.apache.org/jira/browse/MAHOUT-700> > > When running 'seqdirectory' on Amazon Elastic MapReduce, FileSystem.get() > fails due to use of FileSystem.get(conf) instead of FileSystem.get(uri, > conf) <https://issues.apache.org/jira/browse/MAHOUT-700> > [image: Minor][image: Open] Open18/May/1118/May/11 > *Actions*<https://issues.apache.org/jira/rest/api/1.0/issues/12507610/ActionsAndOperations?atl_token=fa2d705cd4e03674ec5b1e397ae4f01a353b5f0c> > <https://issues.apache.org/jira/browse/MAHOUT-701>[image: > Bug]<https://issues.apache.org/jira/browse/MAHOUT-701> > MAHOUT-701 <https://issues.apache.org/jira/browse/MAHOUT-701> > > ClusterDumper writes to System.out or local filesystem only (I would like to > write to s3 when running on Elastic > MapReduce)<https://issues.apache.org/jira/browse/MAHOUT-701> > [image: Minor][image: Open] Open18/May/1118/May/11 > *Actions*<https://issues.apache.org/jira/rest/api/1.0/issues/12507611/ActionsAndOperations?atl_token=fa2d705cd4e03674ec5b1e397ae4f01a353b5f0c> > <https://issues.apache.org/jira/browse/MAHOUT-702>[image: New > Feature]<https://issues.apache.org/jira/browse/MAHOUT-702> > MAHOUT-702 <https://issues.apache.org/jira/browse/MAHOUT-702> > > Implement Online Passive Aggressive > learner<https://issues.apache.org/jira/browse/MAHOUT-702> > [image: Minor][image: Patch Available] Patch Available19/May/1118/May/11 > *Actions*<https://issues.apache.org/jira/rest/api/1.0/issues/12507622/ActionsAndOperations?atl_token=fa2d705cd4e03674ec5b1e397ae4f01a353b5f0c> > <https://issues.apache.org/jira/browse/MAHOUT-703>[image: New > Feature]<https://issues.apache.org/jira/browse/MAHOUT-703> > MAHOUT-703 <https://issues.apache.org/jira/browse/MAHOUT-703> > > Implement Gradient machine<https://issues.apache.org/jira/browse/MAHOUT-703> > [image: Minor][image: Open] Open20/May/1119/May/11 > *Actions*<https://issues.apache.org/jira/rest/api/1.0/issues/12507703/ActionsAndOperations?atl_token=fa2d705cd4e03674ec5b1e397ae4f01a353b5f0c> > <https://issues.apache.org/jira/browse/MAHOUT-704>[image: > Improvement]<https://issues.apache.org/jira/browse/MAHOUT-704> > MAHOUT-704 <https://issues.apache.org/jira/browse/MAHOUT-704> > > Refactor PredictionJob to use MultipleInputs for reduce side > joins<https://issues.apache.org/jira/browse/MAHOUT-704> > [image: Major][image: Open] Open19/May/1119/May/11 > <https://issues.apache.org/jira/rest/api/1.0/issues/12507733/ActionsAndOperations?atl_token=fa2d705cd4e03674ec5b1e397ae4f01a353b5f0c> -- Grant Ingersoll Lucene Revolution -- Lucene and Solr User Conference May 25-26 in San Francisco www.lucenerevolution.org

Re: SF Informal meetup on May 23?

2011-05-19 Thread Grant Ingersoll
On May 19, 2011, at 3:52 PM, Ted Dunning wrote: > Is consensus forming here? The Empire Tap Room (http://www.etrpa.com/) on Thursday? I still don't fully know my conf. schedule yet, but I'd say we target 7-9 pm or thereabouts at the Tap Room. > > On Thu, May 19, 2011 at 12:51 PM, Dawid We

Re: [VOTE] Release Mahout 0.5 artifacts

2011-05-19 Thread Grant Ingersoll
I think the vote should actually be on https://repository.apache.org/content/repositories/orgapachemahout-024/org/apache/mahout/, as that contains the full list of artifacts that are being released. On May 19, 2011, at 2:44 AM, Sean Owen wrote: > https://repository.apache.org/content/repositor

Re: SF Informal meetup on May 23?

2011-05-19 Thread Grant Ingersoll
On May 19, 2011, at 2:06 AM, Ted Dunning wrote: > That is nearly 2 miles from the train. Going north in traffic will be > fairly heinous. And here, based on my numerous observations, I thought sitting in traffic was officially the California State Pastime! ;-) > > I would suggestion a place

Re: SF Informal meetup on May 23?

2011-05-18 Thread Grant Ingersoll
;>>>> Works for me. >>>>> >>>>> On Sat, May 14, 2011 at 3:30 PM, Jake Mannix >>> wrote: >>>>> >>>>>> I'm not there thursday, most likely. :( >>>>>> >>>>>> On Sat, May 14, 201

Re: MongoDataModel

2011-05-18 Thread Grant Ingersoll
On May 18, 2011, at 9:32 AM, Sean Owen wrote: > On Wed, May 18, 2011 at 12:58 PM, Grant Ingersoll wrote: > >> >> Actually, I think it is core at this point, since we moved the >> Vectorization stuff to core. Unfortunately, we need Lucene core in order to >> g

Re: MongoDataModel

2011-05-18 Thread Grant Ingersoll
On May 18, 2011, at 6:58 AM, Sean Owen wrote: > The reasoning that led to 'taste-webapp' is what leads to create an expanded > 'mahout-integration'. > > When I contributed my code, some folks asked, hmm, could we toss your EJB > and web services integration, because it seems unfortunate to make

Re: MongoDataModel

2011-05-18 Thread Grant Ingersoll
(Fernando, I would say just go ahead and put your patch up against core for now and we'll work this out. This discussion shouldn't derail you from putting up a reasonable first patch based on the current structure -- i.e. put the model where all the other impl. models are for now) On May 18, 2

<    5   6   7   8   9   10   11   12   13   >