[jira] [Comment Edited] (NUTCH-1555) Move to commons-cli for command line parsing

2013-04-23 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639131#comment-13639131 ] lufeng edited comment on NUTCH-1555 at 4/23/13 2:58 PM: already mo

[jira] [Updated] (NUTCH-1555) Move to commons-cli for command line parsing

2013-04-23 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1555: -- Attachment: NUTCH-1555.patch already moved the command line parsing to commons-cli,because they are used in bi

[jira] [Assigned] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-04-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng reassigned NUTCH-1563: - Assignee: lufeng > FetchSchedule#getFields is never used by GeneraterJob > --

[jira] [Commented] (NUTCH-1562) Order of execution for scoring filters

2013-04-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13637247#comment-13637247 ] lufeng commented on NUTCH-1562: --- Hi Julien, if someone define the scoring.filter.order like

[jira] [Updated] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-04-18 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1563: -- Attachment: NUTCH-1563.patch get the fields from FetchSchedule in GeneratorJob > FetchSchedule

[jira] [Created] (NUTCH-1563) FetchSchedule#getFields is never used by GeneraterJob

2013-04-18 Thread lufeng (JIRA)
lufeng created NUTCH-1563: - Summary: FetchSchedule#getFields is never used by GeneraterJob Key: NUTCH-1563 URL: https://issues.apache.org/jira/browse/NUTCH-1563 Project: Nutch Issue Type: Bug

[jira] [Assigned] (NUTCH-1555) Move to commons-cli for command line parsing

2013-04-16 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng reassigned NUTCH-1555: - Assignee: lufeng > Move to commons-cli for command line parsing > --

[jira] [Commented] (NUTCH-1555) bug in 2.x ParserJob command line parsing

2013-04-10 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13627917#comment-13627917 ] lufeng commented on NUTCH-1555: --- Hi Lewis, yes, like you said that we can choose an establis

[jira] [Commented] (NUTCH-1555) bug in 2.x ParserJob command line parsing

2013-04-08 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625432#comment-13625432 ] lufeng commented on NUTCH-1555: --- Hi Lewis, as you said that FetchJob also has this bug too.

[jira] [Updated] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-04-06 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1545: -- Attachment: NUTCH-1545-v2.patch 1. remove any concept of crawldb and segments in bin/crawl script 2. fix the ca

[jira] [Commented] (NUTCH-1538) tuning of loaded fields during fetcherJob start-up

2013-03-31 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618536#comment-13618536 ] lufeng commented on NUTCH-1538: --- Hi Roland, yes, i mean that may be 3rd part plugin will use

[jira] [Commented] (NUTCH-1538) tuning of loaded fields during fetcherJob start-up

2013-03-28 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616250#comment-13616250 ] lufeng commented on NUTCH-1538: --- yes, However, we can not guarantee that other plugin that e

[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-28 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616227#comment-13616227 ] lufeng commented on NUTCH-1547: --- Feng Committed revision 1462078 to trunk and 2.x revision 1

[jira] [Resolved] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-28 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng resolved NUTCH-1547. --- Resolution: Fixed > BasicIndexingFilter - Problem to index full title > -

[jira] [Assigned] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-03-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng reassigned NUTCH-1545: - Assignee: lufeng > capture batchId and remove references to segments in 2.x crawl script. > -

[jira] [Commented] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-03-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615422#comment-13615422 ] lufeng commented on NUTCH-1545: --- yes, the concept of crawldb is not used in 2.x, and grab th

[jira] [Commented] (NUTCH-1389) parsechecker and indexchecker to report truncated content

2013-03-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13615360#comment-13615360 ] lufeng commented on NUTCH-1389: --- +1 Sebstian > parsechecker and indexchecke

[jira] [Updated] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1547: -- Attachment: NUTCH-1547-2x.patch add patch to Nutch 2.x > BasicIndexingFilter - Problem to inde

[jira] [Updated] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1547: -- Attachment: NUTCH-1547.patch fixed the problem to index full title > BasicIndexingFilter - Pro

[jira] [Assigned] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng reassigned NUTCH-1547: - Assignee: lufeng > BasicIndexingFilter - Problem to index full title > --

[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614098#comment-13614098 ] lufeng commented on NUTCH-1547: --- Hi Gustavo I will add this patch tomorrow. Thanks Gustavo

[jira] [Commented] (NUTCH-1532) Replace 'segment' mapping field with batchId

2013-03-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614035#comment-13614035 ] lufeng commented on NUTCH-1532: --- Feng Committed @revision 1461140 in 2.x HEAD. Thank you Lew

[jira] [Commented] (NUTCH-1533) Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and setBatchId() accessors in o.a.n.storage.WebPage

2013-03-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13614002#comment-13614002 ] lufeng commented on NUTCH-1533: --- yes, i think this patch is ok. Feng Committed @revision 14

[jira] [Updated] (NUTCH-1545) capture batchId and remove references to segments in 2.x crawl script.

2013-03-25 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1545: -- Attachment: NUTCH-1545.patch remove references to segments in 2.x crawl script. > capture batc

[jira] [Updated] (NUTCH-1532) Replace 'segment' mapping field with batchId

2013-03-25 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1532: -- Attachment: NUTCH-1532-v2.patch add small replaces in TestbedProxy and Benchmark class > Repla

[jira] [Resolved] (NUTCH-1533) Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and setBatchId() accessors in o.a.n.storage.WebPage

2013-03-25 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng resolved NUTCH-1533. --- Resolution: Fixed > Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and > setBatchI

[jira] [Commented] (NUTCH-1533) Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and setBatchId() accessors in o.a.n.storage.WebPage

2013-03-25 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612583#comment-13612583 ] lufeng commented on NUTCH-1533: --- Hi Lewis, I also found a problem when i committed this pat

[jira] [Commented] (NUTCH-1533) Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and setBatchId() accessors in o.a.n.storage.WebPage

2013-03-19 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13606354#comment-13606354 ] lufeng commented on NUTCH-1533: --- Hi Lewis yes, i can commit this issue as soon as possible

[jira] [Updated] (NUTCH-1533) Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and setBatchId() accessors in o.a.n.storage.WebPage

2013-03-17 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1533: -- Attachment: NUTCH-1533-v3.patch add prevModifiedTime to FetchSchedule both methods when crawl status is equal

[jira] [Commented] (NUTCH-1533) Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and setBatchId() accessors in o.a.n.storage.WebPage

2013-03-17 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604614#comment-13604614 ] lufeng commented on NUTCH-1533: --- Hi Lewis I'm sorry, I did not make it clear, perhaps in my

[jira] [Commented] (NUTCH-1533) Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and setBatchId() accessors in o.a.n.storage.WebPage

2013-03-16 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13604309#comment-13604309 ] lufeng commented on NUTCH-1533: --- Hi Lewis Thanks for your reviews. Issues: * i see that p

[jira] [Updated] (NUTCH-1533) Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and setBatchId() accessors in o.a.n.storage.WebPage

2013-03-14 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1533: -- Attachment: NUTCH-1533.patch Implement getPrevModifiedTime(), setPrevModifiedTime(), getBatchId() and setBatch

[jira] [Updated] (NUTCH-1543) Display consistent usage of DBUpdaterJob with 1.X

2013-03-12 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1543: -- Attachment: NUTCH-1543.patch > Display consistent usage of DBUpdaterJob with 1.X >

[jira] [Created] (NUTCH-1543) Display consistent usage of DBUpdaterJob with 1.X

2013-03-12 Thread lufeng (JIRA)
lufeng created NUTCH-1543: - Summary: Display consistent usage of DBUpdaterJob with 1.X Key: NUTCH-1543 URL: https://issues.apache.org/jira/browse/NUTCH-1543 Project: Nutch Issue Type: Bug Affects

[jira] [Updated] (NUTCH-1393) Display consistent usage of GeneratorJob with 1.X

2013-03-07 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1393: -- Attachment: NUTCH-1393-v2.patch Hi Lewis, i improve the log message with regards to the usage message of Parse

[jira] [Updated] (NUTCH-1393) Display consistent usage of GeneratorJob with 1.X

2013-03-06 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1393: -- Attachment: NUTCH-1393.patch add help information when no params input. > Display consistent u

[jira] [Issue Comment Deleted] (NUTCH-1538) tuning of loaded fields during fetcherJob start-up

2013-03-05 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1538: -- Comment: was deleted (was: Hi Roland, Maybe we can add a QueryFieldFilter to remove some field that never used

[jira] [Commented] (NUTCH-1538) tuning of loaded fields during fetcherJob start-up

2013-03-05 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593223#comment-13593223 ] lufeng commented on NUTCH-1538: --- Hi Roland, Maybe we can add a QueryFieldFilter to remove s

[jira] [Commented] (NUTCH-1538) tuning of loaded fields during fetcherJob start-up

2013-03-05 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593224#comment-13593224 ] lufeng commented on NUTCH-1538: --- Hi Roland, Maybe we can add a QueryFieldFilter to remove s

[jira] [Updated] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-28 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1529: -- Attachment: NUTCH-1529-trunk-v3.patch @Lewis add the mongodb dependency in ivy.xml @Tejas It will write the url

[jira] [Comment Edited] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13587938#comment-13587938 ] lufeng edited comment on NUTCH-1529 at 2/27/13 2:49 AM: Hi Lewis,

[jira] [Updated] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-26 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1529: -- Attachment: NUTCH-1529-trunk-v2.patch Hi Lewis, i have been corrected the issues that your pointed. thank you f

[jira] [Updated] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-25 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1529: -- Attachment: NUTCH-1529-trunk.patch Utility that converts mongodb collection record into a flat file of URLs to

[jira] [Commented] (NUTCH-1031) Delegate parsing of robots.txt to crawler-commons

2013-02-25 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13585730#comment-13585730 ] lufeng commented on NUTCH-1031: --- Hi Tejas 1. The EmptyRobotRules class is not delete in pat

[jira] [Commented] (NUTCH-1373) Implement consistent execution of normalising and filtering in Generator

2013-02-21 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584062#comment-13584062 ] lufeng commented on NUTCH-1373: --- Hi Lewis Do you mean we can put URLNormalizers in Generato

[jira] [Commented] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers

2013-02-21 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584043#comment-13584043 ] lufeng commented on NUTCH-1521: --- Hi Tejas Yes, you are right. It seems that DbUpdateMapper

[jira] [Assigned] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-21 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng reassigned NUTCH-1529: - Assignee: lufeng > Port nutch-mongdb-parser to trunk > - > >

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-02-19 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581910#comment-13581910 ] lufeng commented on NUTCH-1047: --- The patch v5 is work correctly in nutch 1.6 with solr 3.6.

[jira] [Commented] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-18 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13581030#comment-13581030 ] lufeng commented on NUTCH-1529: --- Yes, your are right. This is not nutch core functionality.

[jira] [Commented] (NUTCH-1529) Port nutch-mongdb-parser to trunk

2013-02-18 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13580991#comment-13580991 ] lufeng commented on NUTCH-1529: --- Hi Lewis, do you mean we should add a tool to easy seeding

[jira] [Commented] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers

2013-02-03 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570019#comment-13570019 ] lufeng commented on NUTCH-1521: --- Hi Lewis, I found the CrawlDbFilter class is only used in N

[jira] [Commented] (NUTCH-1525) Generator to record external links even when db.ignore.external.links set to true

2013-02-03 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13570005#comment-13570005 ] lufeng commented on NUTCH-1525: --- Hi Lewis, Yes, The redirection of current URL will be igno

[jira] [Commented] (NUTCH-1525) Generator to record external links even when db.ignore.external.links set to true

2013-01-31 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13567488#comment-13567488 ] lufeng commented on NUTCH-1525: --- Hi Lewis, Do you mean whether or not set the db.ignore.exte

[jira] [Commented] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers

2013-01-29 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13566118#comment-13566118 ] lufeng commented on NUTCH-1521: --- Hi Lewis, thanks for your brief class description and trunk

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-01-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564089#comment-13564089 ] lufeng commented on NUTCH-1047: --- Hi Julien, I found in bin/nutch there is a line like this

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-01-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564076#comment-13564076 ] lufeng commented on NUTCH-1047: --- Hi Tejas Maybe you don't add -D option with bin/nutch craw

[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor

2013-01-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1250: -- Attachment: TestDomContentUitls_v1.patch TestDomContenxtUtils patch add no anchor test case. >

[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor

2013-01-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1250: -- Attachment: DOMContentUtils_v2.patch > parse-html does not parse links with empty anchor >

[jira] [Commented] (NUTCH-1250) parse-html does not parse links with empty anchor

2013-01-27 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13564015#comment-13564015 ] lufeng commented on NUTCH-1250: --- Hi Julien, yes, it will generate a lot of noise added by Ne

[jira] [Commented] (NUTCH-1250) parse-html does not parse links with empty anchor

2013-01-24 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562466#comment-13562466 ] lufeng commented on NUTCH-1250: --- with no children of tag "a", it will be ignored.

[jira] [Updated] (NUTCH-1250) parse-html does not parse links with empty anchor

2013-01-24 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1250: -- Attachment: DOMContentUtils_v1.patch > parse-html does not parse links with empty anchor >

[jira] [Commented] (NUTCH-1047) Pluggable indexing backends

2013-01-24 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13562369#comment-13562369 ] lufeng commented on NUTCH-1047: --- Hi, i put the patch , but i do not found how to set solrURI

[jira] [Updated] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers

2013-01-21 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1521: -- Attachment: TestCrawlDbFilter.java CrawlDbFilter test case > CrawlDbFilter pass null url to ur

[jira] [Updated] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers

2013-01-21 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1521: -- Attachment: CrawlDbFilter_v1.patch > CrawlDbFilter pass null url to urlNormailzers > --

[jira] [Created] (NUTCH-1521) CrawlDbFilter pass null url to urlNormailzers

2013-01-21 Thread lufeng (JIRA)
lufeng created NUTCH-1521: - Summary: CrawlDbFilter pass null url to urlNormailzers Key: NUTCH-1521 URL: https://issues.apache.org/jira/browse/NUTCH-1521 Project: Nutch Issue Type: Bug Affects Ver

[jira] [Commented] (NUTCH-1223) Migrate WebGraph to MapReduce API

2013-01-21 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558620#comment-13558620 ] lufeng commented on NUTCH-1223: --- Hi Tejas, thanks for your reminding. but there still some n

[jira] [Updated] (NUTCH-1223) Migrate WebGraph to MapReduce API

2013-01-21 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1223: -- Attachment: WebGraph_new_MR_API_v2.patch migrate WebGraph to new MR api > Migrate WebGraph to

[jira] [Updated] (NUTCH-1223) Migrate WebGraph to MapReduce API

2013-01-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1223: -- Patch Info: Patch Available > Migrate WebGraph to MapReduce API > - > >

[jira] [Updated] (NUTCH-1223) Migrate WebGraph to MapReduce API

2013-01-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1223: -- Attachment: WebGraph_new_MR_API.patch migrate WebGraph to new MR API patch > Migrate WebGraph

[jira] [Assigned] (NUTCH-1223) Migrate WebGraph to MapReduce API

2013-01-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng reassigned NUTCH-1223: - Assignee: lufeng > Migrate WebGraph to MapReduce API > - > >

[jira] [Commented] (NUTCH-1219) Upgrade all jobs to new MapReduce API

2013-01-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558476#comment-13558476 ] lufeng commented on NUTCH-1219: --- Hi Markus, i see that Injector, Generator and fetchor are s

[jira] [Commented] (NUTCH-1519) Configuration Overrides not in sync between WebTableReader and nutch-default.xml

2013-01-17 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556985#comment-13556985 ] lufeng commented on NUTCH-1519: --- yes,but maybe i think put them into the default.xml like ot

[jira] [Updated] (NUTCH-1453) Substantiate tests for IndexingFilters

2013-01-17 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1453: -- Patch Info: Patch Available > Substantiate tests for IndexingFilters >

[jira] [Work stopped] (NUTCH-1453) Substantiate tests for IndexingFilters

2013-01-17 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1453 stopped by lufeng. > Substantiate tests for IndexingFilters > -- > > Key: NUTCH-1453 >

[jira] [Commented] (NUTCH-1449) Optionally delete documents skipped by IndexingFilters

2013-01-17 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13556915#comment-13556915 ] lufeng commented on NUTCH-1449: --- Hi Markus, do you mean the indexing filter can delete the d

[jira] [Commented] (NUTCH-1519) Configuration Overrides not in sync between WebTableReader and nutch-default.xml

2013-01-16 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13555927#comment-13555927 ] lufeng commented on NUTCH-1519: --- Hi Lewis, do you mean that properties shoud be defined in n

[jira] [Updated] (NUTCH-1453) Substantiate tests for IndexingFilters

2013-01-15 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1453: -- Attachment: TestIndexingFilters-1.7.patch TestIndexingFilters-2x.patch add two test case of Ind

[jira] [Work started] (NUTCH-1453) Substantiate tests for IndexingFilters

2013-01-15 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1453 started by lufeng. > Substantiate tests for IndexingFilters > -- > > Key: NUTCH-1453 >

[jira] [Updated] (NUTCH-1453) Substantiate tests for IndexingFilters

2013-01-15 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1453: -- Attachment: (was: TestIndexingFilters_patch.patch) > Substantiate tests for IndexingFilters > -

[jira] [Commented] (NUTCH-1453) Substantiate tests for IndexingFilters

2013-01-15 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554654#comment-13554654 ] lufeng commented on NUTCH-1453: --- Thanks Lewis, maybe you can assign this issues to me. i wil

[jira] [Updated] (NUTCH-1453) Substantiate tests for IndexingFilters

2013-01-14 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] lufeng updated NUTCH-1453: -- Attachment: TestIndexingFilters_patch.patch TestIndexingFilters patch > Substantiate tests for

[jira] [Commented] (NUTCH-1453) Substantiate tests for IndexingFilters

2013-01-14 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553558#comment-13553558 ] lufeng commented on NUTCH-1453: --- hi Lewis, do you think the patch has any problems?

[jira] [Commented] (NUTCH-1100) SolrDedup broken

2012-08-20 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13437843#comment-13437843 ] lufeng commented on NUTCH-1100: --- Maybe it is a setting problem, do you change the mapping fi

[jira] [Commented] (NUTCH-1405) Allow to overwrite CrawlDatum's with injected entries

2012-08-03 Thread lufeng (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13427891#comment-13427891 ] lufeng commented on NUTCH-1405: --- in Injector.java (1363793) this is a problem. if injectedS

<    1   2