Re: IndexFormatTooOldException with Solr4.0 ?

2012-11-19 Thread tomw
On Mo, 2012-11-19 at 15:54 -0500, Grant Ingersoll wrote: > Correct. The 4.0 work is not committed yet. I'm hoping to consolidate some > of the redundant code around Lucene as part of this upgrade. Also, some of > the constructors, etc. appear to have changed. In general, I'd like to make > i

Re: IndexFormatTooOldException with Solr4.0 ?

2012-11-19 Thread Grant Ingersoll
Correct. The 4.0 work is not committed yet. I'm hoping to consolidate some of the redundant code around Lucene as part of this upgrade. Also, some of the constructors, etc. appear to have changed. In general, I'd like to make it a little easier to leverage the variety of options some of the

Re: Conversion of point numbers to key strings

2012-11-19 Thread Grant Ingersoll
On Nov 19, 2012, at 12:16 PM, Ted Dunning wrote: > This looks like it may be an artifact of switching to Lucene 4.0. > > Grant? I don't believe we have updated to 4 yet, unless I missed something. Christopher, can you provide details on: 1. What version you are running? Is this 0.7 or build

Re: IndexFormatTooOldException with Solr4.0 ?

2012-11-19 Thread tomw
> I'm using 0.80 SNAPSHOT and indeed the root cause may be that mahout > does not support solr 4.0 yet. Just tested with 3.6.1 and it works fine. > The error however is a bit misleading... > Digging a bit deeper, I realized that the dependencies point to Lucene 3.6.0 which is the reason that Luc

Re: Conversion of point numbers to key strings

2012-11-19 Thread Ted Dunning
This looks like it may be an artifact of switching to Lucene 4.0. Grant? On Mon, Nov 19, 2012 at 9:12 AM, Christopher Laux wrote: > Caused by: java.lang.NoSuchFieldError: LUCENE_36 > at > > org.apache.mahout.vectorizer.DefaultAnalyzer.(DefaultAnalyzer.java:34) > ... 11 more > > Any idea

Re: Conversion of point numbers to key strings

2012-11-19 Thread Christopher Laux
Thanks for the hint. Now I get this exception: $ mahout seq2sparse -i ~/run/posts2.seq -o ~/run/posts2-vec -seq -nv Nov 19, 2012 6:09:22 PM org.apache.hadoop.mapred.LocalJobRunner$Job run WARNING: job_local_0001 java.lang.IllegalStateException: java.lang.reflect.InvocationTargetException at o

Re: Command line : Error using clusterdump after cvb (0.7)

2012-11-19 Thread Jérémie Gomez
Hi Jake, It's a great idea indeed. However I'm new to the mahout ; could you give me some pointers as to where to publish this guide and maybe an example of a well-formed already existing guide that I could use as an example ? Thank you ! Jeremie 2012/11/16 Jake Mannix > I'm glad to hear it's

Re: SSVD fails on seq2sparse output.

2012-11-19 Thread Sean Owen
(Yes, it is a Java binary requiring Java 6+. It runs against Hadoop 0.20.x - 2.0.x or work-alikes, or Amazon EMR. The work is in the reducer in this implementation, so you would need to hand the reducers extra memory instead of mappers. I think that you can run the whole 20M rows of input in Myrrix

Re: SSVD fails on seq2sparse output.

2012-11-19 Thread Abramov Pavel
Just checked TOP at worker node during 20% job (4 000 000 users in my case): java process uses 2800 MB (resident mem). Good news for me, both U and M iteration passed on 20% sample. Can I use current M (computed on 20% of users, 15 iterations) to process the reminder (80% users)? Fix M, recompute

Re: SSVD fails on seq2sparse output.

2012-11-19 Thread Sebastian Schelter
Thats huge. It means you need to fit a dense 20M x 20 matrix into the mappers RAM that recompute U. This will require a few gigabytes... If that doesn't work for you, you could try to rewrite the job to use reduce-side joins to recompute the factors, this would however be a much slower implementat

Re: SSVD fails on seq2sparse output.

2012-11-19 Thread Abramov Pavel
About 20 000 000 users and 150 000 items. 0,03% non-zeros. 20 features required. Pavel 19.11.12 12:31 пользователь "Sebastian Schelter" написал: >You need to give much more memory than 200 MB to your mappers. What are >the dimensions of your input in terms of users and items? > >--sebastian > >

Re: SSVD fails on seq2sparse output.

2012-11-19 Thread Sebastian Schelter
You need to give much more memory than 200 MB to your mappers. What are the dimensions of your input in terms of users and items? --sebastian On 19.11.2012 09:28, Abramov Pavel wrote: > Thanks for your replies. > > 1) >> Can you describe your failure or give us a strack trace? > > > Here is j

Re: SSVD fails on seq2sparse output.

2012-11-19 Thread Abramov Pavel
Hi Sean, > PS I think I mentioned off-list, but this is more or less exactly the >basis > of Myrrix (http://myrrix.com). It should be able to handle this scale, > maybe slightly more easily since it can load only the subset of these > matrices needed by each worker -- more reducers means less RAM

Re: SSVD fails on seq2sparse output.

2012-11-19 Thread Abramov Pavel
Thanks for your replies. 1) > Can you describe your failure or give us a strack trace? Here is job log: 12/11/19 09:54:07 INFO als.ParallelALSFactorizationJob: Recomputing U (iteration 0/15) … 12/11/19 10:03:31 INFO mapred.JobClient: Job complete: job_201211150152_1671 12/11/19 10:03:31 INFO a