Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Dawid Weiss
I agree with Ted: screwing up your repository (I assume a local clone of something remote) in svn is much easier than in git, for example by moving a folder from one place to another. If I can recommend something, this book is quite nice, especially for beginners ("Basic Usage" chapter): http://bo

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-18 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107640#comment-13107640 ] Lance Norskog commented on MAHOUT-524: -- As for 5-d points v.s. 2-d points, SVD does a

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Lance Norskog
I have the ability to bollix svn in ways that nobody else fathoms. Some fans promote Mercurial as "Git without pain". On Sun, Sep 18, 2011 at 8:09 PM, Ted Dunning wrote: > On Sun, Sep 18, 2011 at 5:21 PM, Lance Norskog wrote: > > > One important caveat: git is a rope factory for hanging yoursel

[jira] [Updated] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-18 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated MAHOUT-524: - Attachment: EclipseLog_20110918.txt > DisplaySpectralKMeans example fails > -

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-18 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107635#comment-13107635 ] Lance Norskog commented on MAHOUT-524: -- For completeness, the log when running under

[jira] [Commented] (MAHOUT-524) DisplaySpectralKMeans example fails

2011-09-18 Thread Lance Norskog (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107633#comment-13107633 ] Lance Norskog commented on MAHOUT-524: -- Possibly a little help. When run from the com

[jira] [Commented] (MAHOUT-814) SSVD local tests should use their own tmp space to avoid collisions

2011-09-18 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107609#comment-13107609 ] Dmitriy Lyubimov commented on MAHOUT-814: - it's actually will only manifest in tes

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Ted Dunning
On Sun, Sep 18, 2011 at 5:21 PM, Lance Norskog wrote: > One important caveat: git is a rope factory for hanging yourself. It badly > needs a Chef/Puppet-style "describe the end result" executor. Don't be > surprised when you have to re-build your whole checkout when something > unfathomable blows

[jira] [Commented] (MAHOUT-814) SSVD local tests should use their own tmp space to avoid collisions

2011-09-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107567#comment-13107567 ] Grant Ingersoll commented on MAHOUT-814: Yeah, it is different. It's not actually

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Lance Norskog
It's gitk on windows. Also there's a Tortoise git manager for the windows desktop. And Github has a mac-only local management app. One important caveat: git is a rope factory for hanging yourself. It badly needs a Chef/Puppet-style "describe the end result" executor. Don't be surprised when you ha

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Ted Dunning
As part of that learning curve, make sure you check out gitx (on the mac, gitg on linux, I don't care what is on windows). It makes it easier to understand what the branching structure is. I recommend invoking as gitx --all to show all of the branches right away. This will highlight the interest

Re: Graph Output formats

2011-09-18 Thread Ted Dunning
On Sun, Sep 18, 2011 at 2:15 PM, Grant Ingersoll wrote: > Cool, I've pushed my changes to ClusterDumper to Lucid's github account > (lucidimagination) and am planning on pushing all of it to Mahout this week. > It is now possible to output CSV, Text (the current option) and GraphML. > Easy enoug

[jira] [Commented] (MAHOUT-814) SSVD local tests should use their own tmp space to avoid collisions

2011-09-18 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107534#comment-13107534 ] Dmitriy Lyubimov commented on MAHOUT-814: - oh I think this particular thing is dif

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Dawid Weiss
I didn't mean to criticize github -- I use it myself for a number of projects and I've been extremely happy with their service. I merely suggested that in terms of the learning curve one may wish to start with local branches and then slowly progress to adding more remote sources. I think throwing m

[jira] [Updated] (MAHOUT-814) SSVD local tests should use their own tmp space to avoid collisions

2011-09-18 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy Lyubimov updated MAHOUT-814: Summary: SSVD local tests should use their own tmp space to avoid collisions (was: QRFirst

[jira] [Commented] (MAHOUT-814) QRFirstStep should use their own tmp space to avoid collisions

2011-09-18 Thread Dmitriy Lyubimov (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107529#comment-13107529 ] Dmitriy Lyubimov commented on MAHOUT-814: - they used to use their own space. But I

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Dmitriy Lyubimov
Yes, one doesn't have to use github of course. I do it just to share, collaborate and let people try and preview what I do with a more timely detailed history and in more convenient way than an issue patch allows. Besides, it allows me to have a backup in case my desktop disk goes cuckoo, and wor

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Dawid Weiss
> That is, once you are over the learning curve and have a good workflow!  I've > been doing an SVN patch workflow for a long time now and it has served me > well.  Oh well, time to move on! I'll put it this way: moving to git is well worth the time spent on learning. I was a skeptic myself... f

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Grant Ingersoll
On Sep 18, 2011, at 3:20 PM, Ted Dunning wrote: > Actually, this is important to say. Speed is one of the huge advantages of > git over other options. That is, once you are over the learning curve and have a good workflow! I've been doing an SVN patch workflow for a long time now and it has s

Re: Graph Output formats

2011-09-18 Thread Grant Ingersoll
Cool, I've pushed my changes to ClusterDumper to Lucid's github account (lucidimagination) and am planning on pushing all of it to Mahout this week. It is now possible to output CSV, Text (the current option) and GraphML. Easy enough to extend to output JSON or whatever. I would imagine it wo

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Ted Dunning
Actually, this is important to say. Speed is one of the huge advantages of git over other options. On Sun, Sep 18, 2011 at 1:13 PM, Dawid Weiss wrote: > In case of Lucene you can also work on multiple svn branches and do > the switching using git... needless to say this is way faster than > usin

[jira] [Commented] (MAHOUT-542) MapReduce implementation of ALS-WR

2011-09-18 Thread Ted Dunning (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107508#comment-13107508 ] Ted Dunning commented on MAHOUT-542: Go for it. The Hadoop API is very confused at th

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Dawid Weiss
I looked at it -- yes, this is the way to follow. You can save some complexity by not keeping a github remote (if you work from one place, a local feature branch is enough, no need to push/pull to github). In case of Lucene you can also work on multiple svn branches and do the switching using git.

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Ted Dunning
Dmitriy documented his work-flow which is very similar to this: http://weatheringthrutechdays.blogspot.com/2011/04/git-github-and-committing-to-asf-svn.html I use his process almost exactly. On Sun, Sep 18, 2011 at 5:58 AM, Dawid Weiss wrote: > Yes, these instructions worked for me: > go to htt

Re: Graph Output formats

2011-09-18 Thread Ted Dunning
You have to make one hack to make sure that the JS downloads from your local server, but that is easy. On Sun, Sep 18, 2011 at 12:17 PM, Ted Dunning wrote: > Yes. The old stuff from google used to require their servers and was very > limited on size of data. > > This newer stuff is not. > > > O

Re: Graph Output formats

2011-09-18 Thread Ted Dunning
Yes. The old stuff from google used to require their servers and was very limited on size of data. This newer stuff is not. On Sun, Sep 18, 2011 at 4:46 AM, Grant Ingersoll wrote: > > On Sep 17, 2011, at 9:22 PM, Ted Dunning wrote: > > > I strongly recommend Google's visualization API. > > Cool

Re: issue while running lucene.vector driver in mahout 0.5

2011-09-18 Thread Grant Ingersoll
The LuceneIterator has a built-in circuit breaker if it gets too many errors. If you are using lucene.vector, you can pass in --maxPercentErrorDocs X, where X is some percentage of docs you are willing to allow errors in. The default is no errors. On Sep 18, 2011, at 10:48 AM, Philippe Adji

[jira] [Commented] (MAHOUT-542) MapReduce implementation of ALS-WR

2011-09-18 Thread Fabian Alenius (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107458#comment-13107458 ] Fabian Alenius commented on MAHOUT-542: --- Okay. I'll wait a bit to see if anyone obje

Re: ParallelArraysSGDFactorizer is in the KDD Cup examples

2011-09-18 Thread Sebastian Schelter
I'd say no because it's only a copy of ExpectationMaximizationSVDFactorizer that uses a hacky /quirky DataModel implementation to save a lot of RAM. On 18.09.2011 10:59, Lance Norskog wrote: > Should ParallelArraysSGDFactorizer be promoted to the svdrecommender package > in core/src? >

[jira] [Commented] (MAHOUT-542) MapReduce implementation of ALS-WR

2011-09-18 Thread Sebastian Schelter (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107454#comment-13107454 ] Sebastian Schelter commented on MAHOUT-542: --- Current hadoop version is 0.20.204.

[jira] [Commented] (MAHOUT-542) MapReduce implementation of ALS-WR

2011-09-18 Thread Fabian Alenius (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107453#comment-13107453 ] Fabian Alenius commented on MAHOUT-542: --- Hi, I was thinking of rewriting the itemRat

issue while running lucene.vector driver in mahout 0.5

2011-09-18 Thread Philippe Adjiman
Hi, I was trying to generate vectors from a lucene index using the lucene.vector driver, it worked fine using mahout 0.4 but in mahout 0.5 i get the following exception: SEVERE: There are too many documents that do not have a term vector for description Exception in thread "main" java.lang.Illega

[jira] [Updated] (MAHOUT-814) QRFirstStep should use their own tmp space to avoid collisions

2011-09-18 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated MAHOUT-814: --- Summary: QRFirstStep should use their own tmp space to avoid collisions (was: LocalSSDSolver

[jira] [Created] (MAHOUT-814) LocalSSDSolver tests should use their own tmp space to avoid collisions

2011-09-18 Thread Grant Ingersoll (JIRA)
LocalSSDSolver tests should use their own tmp space to avoid collisions --- Key: MAHOUT-814 URL: https://issues.apache.org/jira/browse/MAHOUT-814 Project: Mahout Issue Type:

[jira] [Resolved] (MAHOUT-813) RecommenderJob incorrectly sets io.sort.mb

2011-09-18 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/MAHOUT-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved MAHOUT-813. -- Resolution: Fixed Assignee: Sean Owen (was: Grant Ingersoll) > RecommenderJob incorrectly sets i

[jira] [Created] (MAHOUT-813) RecommenderJob incorrectly sets io.sort.mb

2011-09-18 Thread Grant Ingersoll (JIRA)
RecommenderJob incorrectly sets io.sort.mb -- Key: MAHOUT-813 URL: https://issues.apache.org/jira/browse/MAHOUT-813 Project: Mahout Issue Type: Bug Affects Versions: 0.6 Reporter: Grant

Re: RecommenderJob and io.sort.mb

2011-09-18 Thread Grant Ingersoll
I opened MAHOUT-813. Agreed, a cap would be good. And agreed on reasoning to set it. On Sep 18, 2011, at 9:30 AM, Sean Owen wrote: > I can just cap it at, say, 1024MB. > > This isn't in the config because that would change it for all jobs, and it > is probably not a good idea in general to us

Re: RecommenderJob and io.sort.mb

2011-09-18 Thread Sean Owen
I can just cap it at, say, 1024MB. This isn't in the config because that would change it for all jobs, and it is probably not a good idea in general to use so much memory for the combiner. Here it's the right thing to do. On Sun, Sep 18, 2011 at 2:26 PM, Grant Ingersoll wrote: > I'm trying to r

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Grant Ingersoll
Awesome! Thanks. On Sep 18, 2011, at 7:58 AM, Dawid Weiss wrote: > Yes, these instructions worked for me: > go to http://wiki.apache.org/general/GitAtApache, then: "Git for > Apache committers". The URL for git svn init needs to be: > > git svn init --prefix=origin/ --tags=tags --trunk=trunk >

RecommenderJob and io.sort.mb

2011-09-18 Thread Grant Ingersoll
I'm trying to run the RecommenderJob (trunk as of this morning) and am getting: java.io.IOException: Invalid "io.sort.mb": 2048 at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:939) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.(MapTask.java:673)

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Dawid Weiss
Yes, these instructions worked for me: go to http://wiki.apache.org/general/GitAtApache, then: "Git for Apache committers". The URL for git svn init needs to be: git svn init --prefix=origin/ --tags=tags --trunk=trunk --branches=branches https://svn.apache.org/repos/asf/lucene/dev Should work out

Re: anybody is using git-svn for apache svn commits?

2011-09-18 Thread Grant Ingersoll
Resurrecting old thread... I originally just cloned from the ASF Git mirrors. Is there a way to then associate it with an SVN repos so that I can then push a branch to SVN? I've got a rather large set of changes across several commits (and don't remember when I started). My thinking was I wo

Re: Graph Output formats

2011-09-18 Thread Grant Ingersoll
On Sep 17, 2011, at 9:22 PM, Ted Dunning wrote: > I strongly recommend Google's visualization API. Cool. Here I thought it required using Goog's servers, but I guess not. So you can run the server and hit it locally? > > This is divided into two parts, the reporting half and the data source

ParallelArraysSGDFactorizer is in the KDD Cup examples

2011-09-18 Thread Lance Norskog
Should ParallelArraysSGDFactorizer be promoted to the svdrecommender package in core/src? -- Lance Norskog goks...@gmail.com