Build failed in Jenkins: Nutch-trunk #1456

2011-04-13 Thread Apache Hudson Server
See -- Started by timer Building remotely on ubuntu1 FATAL: cannot assign instance of hudson.model.StreamBuildListener to field hudson.scm.subversion.WorkspaceUpdater$UpdateTask.listener of type huds

[jira] [Closed] (NUTCH-778) Running Nutch On linux having whoami exception?

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-778. --- Closing all resolved issues with a non-fixed status. > Running Nutch On linux having whoami exception? >

[jira] [Closed] (NUTCH-736) how long it takes nutch 1.0 to fetch

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-736. --- Closing all resolved issues with a non-fixed status. > how long it takes nutch 1.0 to fetch > ---

[jira] [Closed] (NUTCH-733) plain text view of cached files ignores HTML encoding

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-733. --- Closing all resolved issues with a non-fixed status. > plain text view of cached files ignores HTML encod

[jira] [Closed] (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-692. --- Closing all resolved issues with a non-fixed status. > AlreadyBeingCreatedException with Hadoop 0.19 > --

[jira] [Closed] (NUTCH-934) Upgrade to Tika 0.8

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-934. --- Closing all resolved issues with a non-fixed status. > Upgrade to Tika 0.8 > --- > >

[jira] [Closed] (NUTCH-454) Review Debug Level Log Guards

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-454. --- Closing all resolved issues with a non-fixed status. > Review Debug Level Log Guards > --

[jira] [Closed] (NUTCH-791) External links for published javadocs are partially broken

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-791. --- Closing all resolved issues with a non-fixed status. > External links for published javadocs are partiall

[jira] [Closed] (NUTCH-852) parser not found for contentType=application/xhtml+xml

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-852. --- Closing all resolved issues with a non-fixed status. > parser not found for contentType=application/xhtml

[jira] [Closed] (NUTCH-805) Unable to resolve the url-blah-blah, skipping

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-805. --- Closing all resolved issues with a non-fixed status. > Unable to resolve the url-blah-blah, skipping > --

[jira] [Closed] (NUTCH-558) Need tool to retrieve domain statistics

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-558. --- Resolution: Won't Fix Since search has been delegated to Solr, retrieving statistics can be done with

[jira] [Closed] (NUTCH-959) use of "ROWS" destroys result-lists: first hit appears also als last hit on each "page" (search via search?query... -> xml )

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-959. --- Resolution: Won't Fix This smells like a question which should be addressed on the mailing list. Anyw

[jira] [Closed] (NUTCH-576) Different Analyzers Support

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-576. --- Resolution: Won't Fix > Different Analyzers Support > --- > >

[jira] [Closed] (NUTCH-947) text.jsp does not compile on Apache Tomcat, and charset is not specified

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-947. --- Resolution: Won't Fix > text.jsp does not compile on Apache Tomcat, and charset is not specified > ---

[jira] [Closed] (NUTCH-313) moreFrom property in search.properties cannot be translated into Japanese. Compound text issue.

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-313. --- Resolution: Won't Fix > moreFrom property in search.properties cannot be translated into Japanese. >

[jira] [Closed] (NUTCH-376) Add methods to control runtime behaviour of NutchBean

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-376. --- Resolution: Won't Fix > Add methods to control runtime behaviour of NutchBean > --

[jira] [Closed] (NUTCH-297) sandbox svn folder

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-297. --- Resolution: Invalid If there is a need to do a lot of work without disrupting the main trunk develope

[jira] [Closed] (NUTCH-435) Synonym-Editor that creates OWL for the ontology plugin

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-435. --- Resolution: Won't Fix > Synonym-Editor that creates OWL for the ontology plugin >

[jira] [Commented] (NUTCH-672) allow unit tests to be run from bin/nutch

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019602#comment-13019602 ] Markus Jelsma commented on NUTCH-672: - Well, it's a convenient improvement for develope

[jira] [Commented] (NUTCH-657) Estonian N-gram profile has wrong name

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019599#comment-13019599 ] Markus Jelsma commented on NUTCH-657: - We could still fix this easily for 1.3. But why

[jira] [Updated] (NUTCH-657) Estonian N-gram profile has wrong name

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-657: Assignee: (was: Markus Jelsma) > Estonian N-gram profile has wrong name > --

[jira] [Assigned] (NUTCH-657) Estonian N-gram profile has wrong name

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-657: --- Assignee: Markus Jelsma > Estonian N-gram profile has wrong name > ---

[jira] [Closed] (NUTCH-942) Add user uid from drupal or other cms to the author field of Nutch

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-942. --- Resolution: Invalid What's this? Inquiries must to the mailing list. > Add user uid from drupal or ot

[jira] [Commented] (NUTCH-648) debian style autocomplete

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019595#comment-13019595 ] Markus Jelsma commented on NUTCH-648: - Although i would be +1, the problem is that in 2

[jira] [Closed] (NUTCH-311) Page with tens of thousands of links OOME'd.

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-311. --- Resolution: Won't Fix There is already a db.max.outlinks.per.page parameter. > Page with tens of thou

[jira] [Closed] (NUTCH-84) Fetcher for constrained crawls

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-84. -- Resolution: Won't Fix This is a very old issue and documented externally. It handles issues that can eas

[jira] [Closed] (NUTCH-478) Add function for stopping FetherThread gracefully

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-478. --- Resolution: Won't Fix The fetcher can be interupted and resumed in 2.0, making this work in 1.x would

[jira] [Closed] (NUTCH-673) Upgrade the Carrot2 plug-in to release 3.0

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-673. --- Resolution: Won't Fix Closing a legacy issue: http://www.lucidimagination.com/search/document/2738eeb0

Nutch 1.3 release

2011-04-13 Thread Markus Jelsma
Hi, There are 4 open issue's for 1.3, 2 are already fixed in 1.3 of which 1 is ready to commit for trunk the other is fixing license headers for trunk. Two very small issues remain which i can fix within the next few days. Beyond that, i'll at least do a clean build and do a complete crawl with

[jira] [Commented] (NUTCH-944) Increase the number of elements to look for URLs and add the ability to specify multiple attributes by elements

2011-04-13 Thread Ken Krugler (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019565#comment-13019565 ] Ken Krugler commented on NUTCH-944: --- I'm curious how this relates to [TIKA-463]. Is it th

[jira] [Resolved] (NUTCH-982) Remove copying of ID and URL field in solrmapping

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-982. - Resolution: Fixed Assignee: Markus Jelsma Committed for trunk in rev 1091895. > Remove copy

[jira] [Created] (NUTCH-983) Upgrade SolrJ

2011-04-13 Thread Markus Jelsma (JIRA)
Upgrade SolrJ - Key: NUTCH-983 URL: https://issues.apache.org/jira/browse/NUTCH-983 Project: Nutch Issue Type: Improvement Components: indexer Affects Versions: 2.0 Reporter: Markus Jelsma Pr

[jira] [Commented] (NUTCH-980) Fix IllegalAccessError with slf4j used in Solrj.

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019307#comment-13019307 ] Markus Jelsma commented on NUTCH-980: - Both 1.5.11 and 1.5.5 work fine on trunk. If the

[jira] [Commented] (NUTCH-982) Remove copying of ID and URL field in solrmapping

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019299#comment-13019299 ] Markus Jelsma commented on NUTCH-982: - If there are no objection i'll commit this today

Re: chinese token overlap bug in org.apache.nutch.summary.basic.BasicSummarizer.getSummary

2011-04-13 Thread Bupo Jung
Thank you for your response. I have to update my code [?]^_^ 在 2011年4月13日 下午7:19,Julien Nioche 写道: > Hi, > > Nutch has moved away from handling the indexing and search itself and now > delegates that to SOLR as of versions 1.3 and 2.0 (both forthcoming). The > issue you described won't be fixed

Re: chinese token overlap bug in org.apache.nutch.summary.basic.BasicSummarizer.getSummary

2011-04-13 Thread Julien Nioche
Hi, Nutch has moved away from handling the indexing and search itself and now delegates that to SOLR as of versions 1.3 and 2.0 (both forthcoming). The issue you described won't be fixed as this part of the code has been removed. Users are encouraged to start using 1.3 and use SOLR for the indexin

chinese token overlap bug in org.apache.nutch.summary.basic.BasicSummarizer.getSummary

2011-04-13 Thread Bupo Jung
I use Nutch for Chinese search. I input a query string like "可爱的小女生"(a lovely little girl),the chinese analyzer turn it to three query token―― 可爱、小女、女生. When using the tokens to get the summary of the result page, a StringIndexOutOfBoundsException throw out. Here is the error log: 2010-12-15 12:18

Re: ActiveThreads=0

2011-04-13 Thread Sebastian Nagel | exorbyte
Dear Amed, some time ago I've stumbled on a similar problem and started a thread on the Nutch Users list: http://www.mail-archive.com/nutch-user@lucene.apache.org/msg14560.html (a fix for parse-pdf as well as PDFBox is included) Maybe that's related maybe not. It depends on the version of Nut