[jira] [Updated] (NUTCH-1075) Delegate language identification to Tika

2011-08-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1075: - Attachment: NUTCH-1075.patch Passes the tests but requires some testing > Delegate language iden

[jira] [Created] (NUTCH-1075) Delegate language identification to Tika

2011-08-01 Thread Julien Nioche (JIRA)
Delegate language identification to Tika Key: NUTCH-1075 URL: https://issues.apache.org/jira/browse/NUTCH-1075 Project: Nutch Issue Type: Improvement Components: parser Affects Versions:

[jira] [Commented] (NUTCH-1044) Redirected URLs and possibly all of their outlinked URLs have invalid scores.

2011-08-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13076043#comment-13076043 ] Julien Nioche commented on NUTCH-1044: -- Will commit soon if there aren't any objectio

Build failed in Jenkins: Nutch-trunk #1564

2011-08-01 Thread Apache Jenkins Server
See -- [...truncated 925 lines...] A src/plugin/parse-tika/plugin.xml A src/plugin/parse-tika/build.xml A src/plugin/lib-regex-filter A src/plugin/lib-regex-filter/ivy.xml A

RE: Nutch 2 and Cassandra

2011-08-01 Thread Tom Davidson
OK... Are you running with a clustered version of Hadoop? I think you have to have your HADOOP_HOME env variable set. Otherwise it runs in local mode. I have been able to run in local mode, but not in deployed mode. -Original Message- From: Alexis [mailto:alexis.detregl...@gmail.com] S

Re: Nutch 2 and Cassandra

2011-08-01 Thread Alexis
Ok this version of hector was properly resolved. Thanks! These are the logs: ~/java/workspace/Nutch/trunk/runtime/deploy$ bin/nutch inject ~/java/workspace/Nutch/seeds 11/08/01 15:17:45 INFO crawl.InjectorJob: InjectorJob: starting 11/08/01 15:17:45 INFO crawl.InjectorJob: InjectorJob: urlDir: /ho

RE: Nutch 2 and Cassandra

2011-08-01 Thread Tom Davidson
I did something similar to below to add the Cassandra dependencies. Note that I am getting NoSuchMethodErrors not ClassNotFoundExceptions. Can you add the hector jars to your nutch job jar and see what you get? I think I am one step ahead of you. BTW, I just added this line to get the hector dep

Re: Nutch 2 and Cassandra

2011-08-01 Thread Alexis
Hi, libthrift is a dependency of cassandra-thrift, as listed here: http://mvnrepository.com/artifact/org.apache.cassandra/cassandra-thrift/0.8.1 During Nutch build, you have to manually tweak the Ivy configuration depending on your choice of the Gora store, in this case Cassandra. Basically you ne

Nutch 2 and Cassandra

2011-08-01 Thread Tom Davidson
Hi All, I am kind of at my wit's end here, so I am hoping someone here can help. I am trying to use Nutch2 and Cassandra and I have been successful using the runtime/local build. I am using the Cloudera CDH3 on CentOs 5 and I do not want to contaminate by hadoop install by dropping in a bunch

[jira] [Created] (NUTCH-1074) topN is ignored with maxNumSegments

2011-08-01 Thread Markus Jelsma (JIRA)
topN is ignored with maxNumSegments --- Key: NUTCH-1074 URL: https://issues.apache.org/jira/browse/NUTCH-1074 Project: Nutch Issue Type: Bug Components: generator Affects Versions: 1.3