[jira] [Commented] (NUTCH-1414) Date extraction parse filter

2012-07-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13408055#comment-13408055 ] Julien Nioche commented on NUTCH-1414: -- I'm concerned about the prolife

[jira] [Commented] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409578#comment-13409578 ] Julien Nioche commented on NUTCH-1360: -- Guys, unless a change is trivial pleas

[jira] [Updated] (NUTCH-1360) Suport the storing of IP address connected to when web crawling

2012-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1360: - Fix Version/s: (was: nutchgora) 2.1 > Suport the storing of

[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1087: - Attachment: NUTCH-1087.patch First version of the nutch crawl script. Please test and review

[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1087: - Attachment: (was: crawl) > Deprecate crawl command and replace with example scr

[jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410344#comment-13410344 ] Julien Nioche commented on NUTCH-1087: -- Good catch Markus. Ideally we'd ne

[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1087: - Attachment: NUTCH-1087-1.6-3.patch The script now determines where the nutch script is located

[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1087: - Attachment: NUTCH-1087-2.1.patch Similar patch for 2.x - NOT TESTED YET

[jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13410423#comment-13410423 ] Julien Nioche commented on NUTCH-1087: -- Trunk : committed revision 1359720.

[jira] [Updated] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-07-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1087: - Fix Version/s: 2.1 > Deprecate crawl command and replace with example scr

[jira] [Created] (NUTCH-1433) Upgrade to Tika 1.2

2012-07-19 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1433: Summary: Upgrade to Tika 1.2 Key: NUTCH-1433 URL: https://issues.apache.org/jira/browse/NUTCH-1433 Project: Nutch Issue Type: Improvement

[jira] [Updated] (NUTCH-1433) Upgrade to Tika 1.2

2012-07-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1433: - Attachment: NUTCH-1433-trunk.patch patch for trunk - please test > Upgrade

[jira] [Updated] (NUTCH-1433) Upgrade to Tika 1.2

2012-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1433: - Attachment: NUTCH-1433-trunk-2.patch Dependency to juniversalchardet needed in root ivy.xml

[jira] [Commented] (NUTCH-1433) Upgrade to Tika 1.2

2012-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419014#comment-13419014 ] Julien Nioche commented on NUTCH-1433: -- Markus : I can't reproduce this i

[jira] [Commented] (NUTCH-1341) NotModified time set to now but page not modified

2012-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419084#comment-13419084 ] Julien Nioche commented on NUTCH-1341: -- Looks like a reasonable thing t

[jira] [Commented] (NUTCH-1388) Optionally maintain custom fetch interval despite AdaptiveFetchSchedule

2012-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419083#comment-13419083 ] Julien Nioche commented on NUTCH-1388: -- don't really like

[jira] [Commented] (NUTCH-1388) Optionally maintain custom fetch interval despite AdaptiveFetchSchedule

2012-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419108#comment-13419108 ] Julien Nioche commented on NUTCH-1388: -- can't you define the default value

[jira] [Commented] (NUTCH-1388) Optionally maintain custom fetch interval despite AdaptiveFetchSchedule

2012-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419135#comment-13419135 ] Julien Nioche commented on NUTCH-1388: -- OK got it, thanks bq. We hav

[jira] [Commented] (NUTCH-1388) Optionally maintain custom fetch interval despite AdaptiveFetchSchedule

2012-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419170#comment-13419170 ] Julien Nioche commented on NUTCH-1388: -- Looks fine +1 > Opt

[jira] [Commented] (NUTCH-1433) Upgrade to Tika 1.2

2012-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419175#comment-13419175 ] Julien Nioche commented on NUTCH-1433: -- Committed in trunk : revision 136

[jira] [Updated] (NUTCH-1433) Upgrade to Tika 1.2

2012-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1433: - Attachment: NUTCH-1433.branch-2.patch PAtch for 2.x- strangely the version of the dependencies

[jira] [Commented] (NUTCH-1433) Upgrade to Tika 1.2

2012-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419260#comment-13419260 ] Julien Nioche commented on NUTCH-1433: -- Anyone to test the patch for

[jira] [Commented] (NUTCH-1433) Upgrade to Tika 1.2

2012-07-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13419258#comment-13419258 ] Julien Nioche commented on NUTCH-1433: -- Hmm, probably had a problem with the

[jira] [Commented] (NUTCH-1445) Add ElasticIndexerJob that indexes to elasticsearch

2012-08-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429058#comment-13429058 ] Julien Nioche commented on NUTCH-1445: -- Ferdy - just to reiterate what was said

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2012-08-06 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13429101#comment-13429101 ] Julien Nioche commented on NUTCH-1047: -- Thanks for your comments Ferdy bq.

[jira] [Commented] (NUTCH-1434) Indexer to delete robots noIndex

2012-08-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434893#comment-13434893 ] Julien Nioche commented on NUTCH-1434: -- bq. I haven't added the conf

[jira] [Commented] (NUTCH-1434) Indexer to delete robots noIndex

2012-08-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434927#comment-13434927 ] Julien Nioche commented on NUTCH-1434: -- Well, let's do configuration

[jira] [Commented] (NUTCH-1233) Rely on Tika for outlink extraction

2012-08-23 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13440271#comment-13440271 ] Julien Nioche commented on NUTCH-1233: -- Would be good to add some test

[jira] [Commented] (NUTCH-1459) Remove dead code (phase2) from InjectorJob

2012-09-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450481#comment-13450481 ] Julien Nioche commented on NUTCH-1459: -- commit ref please Ferdy, th

[jira] [Commented] (NUTCH-1459) Remove dead code (phase2) from InjectorJob

2012-09-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450492#comment-13450492 ] Julien Nioche commented on NUTCH-1459: -- the branch reference but even more so

[jira] [Commented] (NUTCH-1459) Remove dead code (phase2) from InjectorJob

2012-09-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450515#comment-13450515 ] Julien Nioche commented on NUTCH-1459: -- Nah, that'

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-09-13 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454790#comment-13454790 ] Julien Nioche commented on NUTCH-1467: -- bq. I will work on it soon but i am thin

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-09-13 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454856#comment-13454856 ] Julien Nioche commented on NUTCH-1467: -- Hi Kiran Thank you for your comments

[jira] [Commented] (NUTCH-1467) nutch 1.5.1 not able to parse mutliValued metatags

2012-10-03 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468436#comment-13468436 ] Julien Nioche commented on NUTCH-1467: -- Thanks Kiran. See http://wiki.apache

[jira] [Updated] (NUTCH-1475) Nutch 2.1 Index-More Plugin -- A better fall back value for date field

2012-10-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1475: - Affects Version/s: (was: nutchgora) 1.5.1 This is an issue for the 1

[jira] [Commented] (NUTCH-1344) BasicURLNormalizer to normalize https same as http

2012-10-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473066#comment-13473066 ] Julien Nioche commented on NUTCH-1344: -- Good catch Sebastian. PLease commit to

[jira] [Commented] (NUTCH-1475) Nutch 2.1 Index-More Plugin -- A better fall back value for date field

2012-10-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13474198#comment-13474198 ] Julien Nioche commented on NUTCH-1475: -- Nope, looks like a reasonable thing t

[jira] [Commented] (NUTCH-710) Support for rel="canonical" attribute

2012-10-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477716#comment-13477716 ] Julien Nioche commented on NUTCH-710: - Iwan : sure, feel free to send a patch if

[jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

2012-10-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479919#comment-13479919 ] Julien Nioche commented on NUTCH-1477: -- Thanks Mike. I confirm the issue. Did

[jira] [Resolved] (NUTCH-1087) Deprecate crawl command and replace with example script

2012-10-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1087. -- Resolution: Fixed Nutch 2-x : Committed revision 1400390. Can open a new issue if there are

[jira] [Resolved] (NUTCH-1433) Upgrade to Tika 1.2

2012-10-20 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1433. -- Resolution: Fixed Committed revision 1400397. > Upgrade to Tika

[jira] [Updated] (NUTCH-1477) NPE when injecting with DataFileAvroStore

2012-10-25 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1477: - Fix Version/s: 2.2 Assignee: Julien Nioche > NPE when injecting w

[jira] [Updated] (NUTCH-1477) NPE when injecting with DataFileAvroStore

2012-10-25 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1477: - Attachment: webpage.avsc Modified avro schema which allows fields to be null

[jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

2012-10-25 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484148#comment-13484148 ] Julien Nioche commented on NUTCH-1477: -- I found in http://mail-archives.apache

[jira] [Updated] (NUTCH-1477) NPE when injecting with DataFileAvroStore

2012-10-25 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1477: - Priority: Critical (was: Major) > NPE when injecting with DataFileAvroSt

[jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

2012-10-25 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484169#comment-13484169 ] Julien Nioche commented on NUTCH-1477: -- Found a clue in https://issues.apache

[jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

2012-10-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13485172#comment-13485172 ] Julien Nioche commented on NUTCH-1477: -- Hi Lewis bq. Do you suggest we update

[jira] [Created] (NUTCH-1482) Rename HTMLParseFilter

2012-10-29 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1482: Summary: Rename HTMLParseFilter Key: NUTCH-1482 URL: https://issues.apache.org/jira/browse/NUTCH-1482 Project: Nutch Issue Type: Task Components

[jira] [Commented] (NUTCH-1482) Rename HTMLParseFilter

2012-10-31 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487624#comment-13487624 ] Julien Nioche commented on NUTCH-1482: -- Having 2 extension points would be a bi

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488728#comment-13488728 ] Julien Nioche commented on NUTCH-1480: -- Hi Lewis bq. Can I run multiple

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488738#comment-13488738 ] Julien Nioche commented on NUTCH-1480: -- OK thanks. What about having a mechanism

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2012-11-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13488786#comment-13488786 ] Julien Nioche commented on NUTCH-1480: -- nope. I meant implementing the distribu

[jira] [Updated] (NUTCH-1487) Nutch parse fails first time for PDF files and works on reparse

2012-11-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1487: - Component/s: storage parser > Nutch parse fails first time for PDF fi

[jira] [Updated] (NUTCH-1487) Nutch parse fails first time for PDF files and works on reparse

2012-11-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1487: - Labels: mysql (was: ) > Nutch parse fails first time for PDF files and works on repa

[jira] [Resolved] (NUTCH-747) inject&Index metadatas and inherit these metadatas to all matching suburls

2012-11-05 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-747. - Resolution: Implemented This has been made possible since thanks to : - Metadata injection

[jira] [Commented] (NUTCH-1477) NPE when injecting with DataFileAvroStore

2012-12-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526256#comment-13526256 ] Julien Nioche commented on NUTCH-1477: -- Hi Alfonso. That's right. I

[jira] [Updated] (NUTCH-840) Port tests from parse-html to parse-tika

2012-12-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-840: Attachment: NUTCH-840-trunk.patch Modified version of the patch to fix the tests post NUTCH-797

[jira] [Commented] (NUTCH-840) Port tests from parse-html to parse-tika

2012-12-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527362#comment-13527362 ] Julien Nioche commented on NUTCH-840: - The tests now run OK with the patch I

[jira] [Updated] (NUTCH-840) Port tests from parse-html to parse-tika

2012-12-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-840: Affects Version/s: 1.6 Fix Version/s: 1.7 > Port tests from parse-html to parse-t

[jira] [Updated] (NUTCH-891) Nutch build should not depend on unversioned local deps

2012-12-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-891: Affects Version/s: 2.1 Probably not an issue anymore. marking it as 2.x to triage unversioned

[jira] [Closed] (NUTCH-807) JSParseFilter produces malformed URL

2012-12-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-807. --- Resolution: Won't Fix Closing old issues. The JSParseFilter is known to generate noisy URLS a

[jira] [Resolved] (NUTCH-62) Add html META tag information into metaData in index-more plugin

2012-12-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-62?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-62. Resolution: Implemented This can be done in a more flexible way using index-metadata https

[jira] [Updated] (NUTCH-1267) urlmeta to delegate indexing to index-metadata

2012-12-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1267: - Assignee: Julien Nioche > urlmeta to delegate indexing to index-metad

[jira] [Updated] (NUTCH-1267) urlmeta to delegate indexing to index-metadata

2012-12-08 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1267: - Description: Ideally we should get rid of urlmeta altogether and add the transmission of

[jira] [Closed] (NUTCH-412) plugin to parse the feed-url (rss/atom) of a blog

2012-12-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-412. --- Resolution: Implemented 6 years later ;-) the feed and parse-tika plugins can handle feeds

[jira] [Resolved] (NUTCH-648) debian style autocomplete

2012-12-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-648. - Resolution: Won't Fix see comments above > debian style auto

[jira] [Updated] (NUTCH-1314) Impose a limit on the length of outlink target urls

2012-12-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1314: - Fix Version/s: 2.2 1.7 > Impose a limit on the length of outlink tar

[jira] [Resolved] (NUTCH-1347) fetcher politeness related to map-reduce

2012-12-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1347. -- Resolution: Not A Problem > fetcher politeness related to map-red

[jira] [Commented] (NUTCH-1331) limit crawler to defined depth

2012-12-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13535970#comment-13535970 ] Julien Nioche commented on NUTCH-1331: -- Any objections or shall I commit this

[jira] [Updated] (NUTCH-1508) Port limit crawler to defined depth to 2.x

2012-12-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1508: - Summary: Port limit crawler to defined depth to 2.x (was: Port limit crawler to defined depth

[jira] [Created] (NUTCH-1508) Port limit crawler to defined depth to 23

2012-12-21 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1508: Summary: Port limit crawler to defined depth to 23 Key: NUTCH-1508 URL: https://issues.apache.org/jira/browse/NUTCH-1508 Project: Nutch Issue Type

[jira] [Commented] (NUTCH-1508) Port limit crawler to defined depth to 2.x

2012-12-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537804#comment-13537804 ] Julien Nioche commented on NUTCH-1508: -- Need to port the scoring-depth plugi

[jira] [Updated] (NUTCH-1508) Port limit crawler to defined depth to 2.x

2012-12-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1508: - Affects Version/s: 2.2 > Port limit crawler to defined depth to

[jira] [Resolved] (NUTCH-1331) limit crawler to defined depth

2012-12-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-1331. -- Resolution: Fixed Fix Version/s: 1.7 Thanks Markus Committed in revision 1424875 for

[jira] [Comment Edited] (NUTCH-1331) limit crawler to defined depth

2012-12-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13537811#comment-13537811 ] Julien Nioche edited comment on NUTCH-1331 at 12/21/12 11:3

[jira] [Commented] (NUTCH-1510) Upgrade to Hadoop 1.1.1

2012-12-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538134#comment-13538134 ] Julien Nioche commented on NUTCH-1510: -- can you test for 2.x as well? should

[jira] [Commented] (NUTCH-1507) Remove FetcherOutput

2012-12-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538137#comment-13538137 ] Julien Nioche commented on NUTCH-1507: -- Wouldn't that break the compatibi

[jira] [Commented] (NUTCH-1507) Remove FetcherOutput

2012-12-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538151#comment-13538151 ] Julien Nioche commented on NUTCH-1507: -- bq. This code is used nowhere and only

[jira] [Commented] (NUTCH-1507) Remove FetcherOutput

2012-12-21 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538158#comment-13538158 ] Julien Nioche commented on NUTCH-1507: -- Ok. Not entirely clear to me how this s

[jira] [Commented] (NUTCH-1508) Port limit crawler to defined depth to 2.x

2013-01-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545757#comment-13545757 ] Julien Nioche commented on NUTCH-1508: -- Hi Ferdy I did not see NUTCH-1431 at

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-01-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13545958#comment-13545958 ] Julien Nioche commented on NUTCH-1031: -- well we have 2 separate pa

[jira] [Commented] (NUTCH-840) Port tests from parse-html to parse-tika

2013-01-09 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13547783#comment-13547783 ] Julien Nioche commented on NUTCH-840: - Thanks Lewis. Will commit shortly un

[jira] [Updated] (NUTCH-1047) Pluggable indexing backends

2013-01-10 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1047: - Attachment: NUTCH-1047-1.x-v1.patch This is work in progress. This patch creates a new endpoint

[jira] [Updated] (NUTCH-1047) Pluggable indexing backends

2013-01-11 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1047: - Attachment: NUTCH-1047-1.x-v2.patch new version of the patch which removes all SOLR related

[jira] [Created] (NUTCH-1517) CloudSearch indexer

2013-01-11 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1517: Summary: CloudSearch indexer Key: NUTCH-1517 URL: https://issues.apache.org/jira/browse/NUTCH-1517 Project: Nutch Issue Type: New Feature

[jira] [Commented] (NUTCH-1371) Replace Ivy with Maven Ant tasks

2013-01-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552513#comment-13552513 ] Julien Nioche commented on NUTCH-1371: -- Hi Lewis. Yep the plugins need to be man

[jira] [Updated] (NUTCH-1047) Pluggable indexing backends

2013-01-14 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1047: - Attachment: NUTCH-1047-1.x-v3.patch Cleaner version of the patch which removes the content from

[jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script

2013-01-15 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554369#comment-13554369 ] Julien Nioche commented on NUTCH-1087: -- Hi Sebastian bq. SEGMENT=`ls $CRAWL_

[jira] [Commented] (NUTCH-1087) Deprecate crawl command and replace with example script

2013-01-16 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554862#comment-13554862 ] Julien Nioche commented on NUTCH-1087: -- Apologies Seb, I should (a) not read em

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-01-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556026#comment-13556026 ] Julien Nioche commented on NUTCH-1047: -- Good point Markus, thanks. The main iss

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-01-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556041#comment-13556041 ] Julien Nioche commented on NUTCH-1047: -- We definitely need a better mechanism

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-01-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556054#comment-13556054 ] Julien Nioche commented on NUTCH-1047: -- Tried, failed. Re- other issues : woul

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-01-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556079#comment-13556079 ] Julien Nioche commented on NUTCH-1047: -- Should not be a big deal as the cla

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2013-01-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556090#comment-13556090 ] Julien Nioche commented on NUTCH-1480: -- I'd rather it was implemen

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-01-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556091#comment-13556091 ] Julien Nioche commented on NUTCH-1047: -- my suggestion was that you give NUTCH-10

[jira] [Commented] (NUTCH-1480) SolrIndexer to write to multiple servers.

2013-01-17 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556100#comment-13556100 ] Julien Nioche commented on NUTCH-1480: -- probably depends on whether we wan

[jira] [Commented] (NUTCH-840) Port tests from parse-html to parse-tika

2013-01-18 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13557163#comment-13557163 ] Julien Nioche commented on NUTCH-840: - Trunk => Committed revision 1435101. An

[jira] [Updated] (NUTCH-1047) Pluggable indexing backends

2013-01-18 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche updated NUTCH-1047: - Attachment: NUTCH-1047-1.x-v4.patch First working patch! Added the SOLRDedup back into the core

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-01-19 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558195#comment-13558195 ] Julien Nioche commented on NUTCH-1031: -- bq. 1. Continue to have the legacy code

[jira] [Created] (NUTCH-1522) Upgrade to Tika 1.3

2013-01-23 Thread Julien Nioche (JIRA)
Julien Nioche created NUTCH-1522: Summary: Upgrade to Tika 1.3 Key: NUTCH-1522 URL: https://issues.apache.org/jira/browse/NUTCH-1522 Project: Nutch Issue Type: Task Components

<    5   6   7   8   9   10   11   12   13   14   >