RE: .txt to vector

Videnova, Svetlana Thu, 19 Jul 2012 01:31:24 -0700

Hi Lance,

Thank you for your fast answer.
I was changing my : 
CLASSPATH=/opt/lucene-3.6.0/lucene-core-3.6.0.jar:/opt/lucene-3.6.0/lucene-core-3.6.0-javadoc.jar:/opt/lucene-3.6.0/lucene-test-framework-3.6.0.jar:/opt/lucene-3.6.0/lucene-test-framework-3.6.0-javadoc.jar:.


And put 3.6.0 in the pom.xml


But: 

csi@csi-SCENIC-W:/usr/local/apache-mahout-d6d6ee8$ ./bin/mahout seq2sparse 
--input ./examples/output/ --output ./toto/output/ 
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/mahout-examples-0.8-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency/slf4j-jcl-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
12/07/19 09:03:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum 
n-gram size is: 1
12/07/19 09:03:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR 
value: 1.0
12/07/19 09:03:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of 
reduce tasks: 1
12/07/19 09:03:56 INFO input.FileInputFormat: Total input paths to process : 15
12/07/19 09:03:56 INFO mapred.JobClient: Cleaning up the staging area 
file:/tmp/hadoop-csi/mapred/staging/csi-379951768/.staging/job_local_0001
Exception in thread "main" java.io.FileNotFoundException: File 
file:/usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8/data does not 
exist.
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
        at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
        at 
org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:63)
        at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
        at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:919)
        at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:936)
        at org.apache.hadoop.mapred.JobClient.access$500(JobClient.java:170)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:854)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:807)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:807)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:495)
        at 
org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:93)
        at 
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:255)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at 
org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)

csi@csi-SCENIC-W:/usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8$ ls
_logs  part-r-00000  _policy  _SUCCESS

 There is no /usr/local/apache-mahout-d6d6ee8/examples/output/clusters-8/data 
here!


Thank you

-----Message d'origine-----
De : Lance Norskog [mailto:goks...@gmail.com] 
Envoyé : jeudi 19 juillet 2012 09:33
À : user@mahout.apache.org
Objet : Re: .txt to vector

Yes, the Mahout analyzer would have to be updated for Lucene 4.0. I suggest 
using an earlier one. Mahout uses with Lucene in a very simple way, and it is 
OK to use any earlier Lucene from 3.1 to 3.6.

On Wed, Jul 18, 2012 at 11:50 PM, Videnova, Svetlana 
<svetlana.viden...@logica.com> wrote:
> Hi Sean,
>
> In fact i was using lucene version 3.6.0 (saw that in the pom.xml) But 
> in my classpath I was using lucene version 4.0.0
>
> I change pom.xml to 4.0.0 => <lucene.version>4.0.0</lucene.version>
>
> But still the same error:
> ###
> Exception in thread "main" java.lang.VerifyError: class 
> org.apache.mahout.vectorizer.DefaultAnalyzer overrides final method 
> tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/ana
> lysis/TokenStream;
> ###
>
> Should I change something else? Or may be lucene 4.0 is too recent for 
> mahout!?
>
>
>
> Thank you
>
> -----Message d'origine-----
> De : Sean Owen [mailto:sro...@gmail.com] Envoyé : mercredi 18 juillet 
> 2012 22:52 À : user@mahout.apache.org Objet : Re: .txt to vector
>
> This means you're using it with an incompatible version of Lucene. I think 
> we're on 3.1. Check the version that Mahout depends upon and use at least 
> that version or later.
>
> On Wed, Jul 18, 2012 at 6:04 PM, Videnova, Svetlana < 
> svetlana.viden...@logica.com> wrote:
>
>> I'm working with mahout. I'm trying to do web service in java by 
>> myself who will take the output of solr and give this file to mahout.
>> For the moment I successfully do the recommendation part.
>> Now I'm trying to clusterise. For this I have to vectorise the output 
>> of solr.
>> Do you have any idea how to do it please? I was following 
>> https://cwiki.apache.org/MAHOUT/creating-vectors-from-text.html
>> BUT : doesn't work very well (at all...).
>>
>> I'm trying to find how to transform .txt to vector for mahout in 
>> order to clusterise and categorise my information. Is it possible? I 
>> saw that I have to use seqdirectory And seq2sparse.
>>
>> Seqdirectory create a file (with some numbers and everything...) this 
>> step is ok But then when I have to use seq2sparse that gives me this
>> error:
>>
>> csi@csi-SCENIC-W:/usr/local/apache-mahout-d6d6ee8$ ./bin/mahout 
>> seq2sparse --input ./examples/output/ --output ./toto/output/ hadoop 
>> binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running 
>> locally
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in
>> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/mahout-exa
>> m ples-0.8-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency
>> / slf4j-jcl-1.6.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: Found binding in
>> [jar:file:/usr/local/apache-mahout-d6d6ee8/examples/target/dependency
>> / slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
>> explanation.
>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>> 12/07/18 15:53:33 INFO vectorizer.SparseVectorsFromSequenceFiles:
>> Maximum n-gram size is: 1
>> 12/07/18 15:53:33 INFO vectorizer.SparseVectorsFromSequenceFiles:
>> Minimum LLR value: 1.0
>> 12/07/18 15:53:33 INFO vectorizer.SparseVectorsFromSequenceFiles:
>> Number of reduce tasks: 1 Exception in thread "main"
>> java.lang.VerifyError: class
>> org.apache.mahout.vectorizer.DefaultAnalyzer overrides final method 
>> tokenStream.(Ljava/lang/String;Ljava/io/Reader;)Lorg/apache/lucene/analysis/TokenStream;
>>                 at java.lang.ClassLoader.defineClass1(Native Method)
>>                 at
>> java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
>>                 at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
>>                 at
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
>>                 at
>> java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
>>                 at
>> java.net.URLClassLoader.access$000(URLClassLoader.java:58)
>>                 at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
>>                 at java.security.AccessController.doPrivileged(Native
>> Method)
>>                 at
>> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>                 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>                 at
>> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>                 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>>                 at
>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:199)
>>                 at
>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>                 at
>> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>                 at
>> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55)
>>                 at 
>> sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>>                 at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>                 at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>                 at java.lang.reflect.Method.invoke(Method.java:597)
>>                 at
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>>                 at
>> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>                 at
>> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>>
>> im using only lucene 4.0!
>>
>> CLASSPATH=/opt/lucene-4.0.0-ALPHA/demo/lucene-demo-4.0.0-ALPHA.jar:/opt/lucene-4.0.0-ALPHA/core/lucene-core-4.0.0-ALPHA.jar:/opt/lucene-4.0.0-ALPHA/analysis/common/lucene-analyzers-common-4.0.0-ALPHA.jar:/opt/lucene-4.0.0-ALPHA/queryparser/lucene-queryparser-4.0.0-ALPHA.jar:.
>>
>> Please where im wrong?
>>
>>
>> Thank you all
>> Regards
>>
>>
>>
>>
>>
>>
>> Think green - keep it on the screen.
>>
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. It may contain proprietary material, confidential 
>> information and/or be subject to legal privilege. It should not be 
>> copied, disclosed to, retained or used by, any other party. If you 
>> are not an intended recipient then please promptly delete this e-mail 
>> and any attachment and all copies and inform the sender. Thank you.
>>
>>
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended 
> recipient(s) only. It may contain proprietary material, confidential 
> information and/or be subject to legal privilege. It should not be copied, 
> disclosed to, retained or used by, any other party. If you are not an 
> intended recipient then please promptly delete this e-mail and any attachment 
> and all copies and inform the sender. Thank you.
>



--
Lance Norskog
goks...@gmail.com


Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended 
recipient(s) only. It may contain proprietary material, confidential 
information and/or be subject to legal privilege. It should not be copied, 
disclosed to, retained or used by, any other party. If you are not an intended 
recipient then please promptly delete this e-mail and any attachment and all 
copies and inform the sender. Thank you.

RE: .txt to vector

Reply via email to