RE: .txt to vector

2012-07-26 Thread Videnova, Svetlana
illet 2012 11:05 À : user@mahout.apache.org Objet : RE: .txt to vector OK thank you. All good. 31docs, 3fields: content (term count 16), filename (termcount 17), indexDate (termcount 1) There are "bananas" in at least 3files ... Can't understand why 12/07/25 10:03:02 ERROR l

RE: .txt to vector

2012-07-25 Thread Videnova, Svetlana
r@mahout.apache.org Objet : Re: .txt to vector It is a jar file, so just java -jar luke.jar But, there's a problem. Luke releases are keyed to different Lucene releases. You need the right Luke download for your version of Lucene. http://code.google.com/p/luke/downloads/list On Wed,

Re: .txt to vector

2012-07-25 Thread Lance Norskog
wrote: > Sorry but what the command line for running luke? > > -Message d'origine- > De : Lance Norskog [mailto:goks...@gmail.com] > Envoyé : mercredi 25 juillet 2012 09:24 > À : user@mahout.apache.org > Objet : Re: .txt to vector > > The Luke program lets you

RE: .txt to vector

2012-07-25 Thread Videnova, Svetlana
Sorry but what the command line for running luke? -Message d'origine- De : Lance Norskog [mailto:goks...@gmail.com] Envoyé : mercredi 25 juillet 2012 09:24 À : user@mahout.apache.org Objet : Re: .txt to vector The Luke program lets you examine a Lucene index. Try that and check for

Re: .txt to vector

2012-07-25 Thread Lance Norskog
oo > many documents that do not have a term vector for bananas > > > > -Message d'origine- > De : Lance Norskog [mailto:goks...@gmail.com] > Envoyé : mercredi 25 juillet 2012 08:59 > À : user@mahout.apache.org > Objet : Re: .txt to vector > > You'

RE: .txt to vector

2012-07-25 Thread Videnova, Svetlana
;main" java.lang.IllegalStateException: There are too many documents that do not have a term vector for bananas -Message d'origine- De : Lance Norskog [mailto:goks...@gmail.com] Envoyé : mercredi 25 juillet 2012 08:59 À : user@mahout.apache.org Objet : Re: .txt to vector You're making progr

Re: .txt to vector

2012-07-25 Thread Lance Norskog
; 12/07/24 09:25:22 INFO compress.CodecPool: Got brand-new compressor > Exception in thread "main" java.lang.IllegalArgumentException > > -Message d'origine- > De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] > Envoyé : mardi 24 juillet 2012 09:16

RE: .txt to vector

2012-07-24 Thread Videnova, Svetlana
;main" java.lang.IllegalArgumentException -Message d'origine- De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] Envoyé : mardi 24 juillet 2012 09:16 À : user@mahout.apache.org Objet : RE: .txt to vector Hi Lance, My dir contains now : _0.tvf and the others. With the command: apache

RE: .txt to vector

2012-07-24 Thread Videnova, Svetlana
... Thank you -Message d'origine- De : Lance Norskog [mailto:goks...@gmail.com] Envoyé : mardi 24 juillet 2012 04:28 À : user@mahout.apache.org Objet : Re: .txt to vector You have to add termvectors to the field type you want to use. Then, you have to reindex all of the data. You will n

Re: .txt to vector

2012-07-23 Thread Lance Norskog
ld = "PA" which is using in a lot of files so I don’t > understand why the exception tell me "too many documents that do not have a > term vector for PA". > > Somebody can explain me how I have to use the command lucene.vector because > apparently I'm missing som

RE: .txt to vector

2012-07-23 Thread Videnova, Svetlana
ank you all! -Message d'origine- De : Videnova, Svetlana [mailto:svetlana.viden...@logica.com] Envoyé : lundi 23 juillet 2012 10:18 À : user@mahout.apache.org Objet : RE: .txt to vector I'm using mahout on ubuntu and solr on windows i guess with a web service I can get the indexed

RE: .txt to vector

2012-07-23 Thread Videnova, Svetlana
tiveFSLockFactory@157aa53: files: [] Thank you -Message d'origine- De : Lance Norskog [mailto:goks...@gmail.com] Envoyé : samedi 21 juillet 2012 05:55 À : user@mahout.apache.org Objet : Re: .txt to vector Solr creates Lucene index files. You can query it for content in several forma

Re: .txt to vector

2012-07-20 Thread Lance Norskog
, cause my goal is to > take the output of solr (which is .xml, json or php)? > > > > Regards > > > > -Message d'origine- > De : Lance Norskog [mailto:goks...@gmail.com] > Envoyé : vendredi 20 juillet 2012 03:16 > À : user@mahout.apache.org > O

RE: .txt to vector

2012-07-19 Thread Videnova, Svetlana
: Lance Norskog [mailto:goks...@gmail.com] Envoyé : vendredi 20 juillet 2012 03:16 À : user@mahout.apache.org Objet : Re: .txt to vector There are two books out for Mahout and text processing. "Mahout in Action" covers all of the apps in Mahout. "Taming Text" gives a good detailed

Re: .txt to vector

2012-07-19 Thread Lance Norskog
; tf-vectors ; wordcount > > > What is the chunk-0 file exactly? > > > What represent clusters-dump at the end created by using the command > clusterdump? > > > Thank you all! > > > -Message d'origine- > De : Videnova, Svetlana [

RE: .txt to vector

2012-07-19 Thread Videnova, Svetlana
mailto:svetlana.viden...@logica.com] Envoyé : jeudi 19 juillet 2012 14:26 À : user@mahout.apache.org Objet : RE: .txt to vector Yes that i was saying. But I have no idea where in the code mahout calls/creates the data that I don't have. And the clusters that I have (especially clusters-8) are

RE: .txt to vector

2012-07-19 Thread Videnova, Svetlana
@logica.com] Envoyé : jeudi 19 juillet 2012 14:26 À : user@mahout.apache.org Objet : RE: .txt to vector Yes that i was saying. But I have no idea where in the code mahout calls/creates the data that I don't have. And the clusters that I have (especially clusters-8) are old and not ge

RE: .txt to vector

2012-07-19 Thread Videnova, Svetlana
parse step? Thank you -Message d'origine- De : Alexander Aristov [mailto:alexander.aris...@gmail.com] But Envoyé : jeudi 19 juillet 2012 12:05 À : user@mahout.apache.org Objet : Re: .txt to vector you've got another problem now Exception in thread "main" java.io.Fi

Re: .txt to vector

2012-07-19 Thread Alexander Aristov
houtDriver.java:195) > > csi@csi-SCENIC-W:/usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8$ > ls > _logs part-r-0 _policy _SUCCESS > > There is no > /usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8/data here! > > > Thank you > > -Messa

RE: .txt to vector

2012-07-19 Thread Videnova, Svetlana
09:33 À : user@mahout.apache.org Objet : Re: .txt to vector Yes, the Mahout analyzer would have to be updated for Lucene 4.0. I suggest using an earlier one. Mahout uses with Lucene in a very simple way, and it is OK to use any earlier Lucene from 3.1 to 3.6. On Wed, Jul 18, 2012 at 11:50 PM, Videnova, Svetlan

Re: .txt to vector

2012-07-19 Thread Lance Norskog
-Message d'origine- > De : Sean Owen [mailto:sro...@gmail.com] > Envoyé : mercredi 18 juillet 2012 22:52 > À : user@mahout.apache.org > Objet : Re: .txt to vector > > This means you're using it with an incompatible version of Lucene. I think > we're on 3.1. Check

RE: .txt to vector

2012-07-18 Thread Videnova, Svetlana
gmail.com] Envoyé : mercredi 18 juillet 2012 22:52 À : user@mahout.apache.org Objet : Re: .txt to vector This means you're using it with an incompatible version of Lucene. I think we're on 3.1. Check the version that Mahout depends upon and use at least that version or later. On Wed, Jul 1

Re: .txt to vector

2012-07-18 Thread Sean Owen
; Do you have any idea how to do it please? I was following > https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html > BUT : doesn't work very well (at all...). > > I'm trying to find how to transform .txt to vector for mahout in order to > clusterise and categori

.txt to vector

2012-07-18 Thread Videnova, Svetlana
ve any idea how to do it please? I was following https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html BUT : doesn't work very well (at all...). I'm trying to find how to transform .txt to vector for mahout in order to clusterise and categorise my information. Is it possible?