[
https://issues.apache.org/jira/browse/NUTCH-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1686:
Attachment: NUTCH-1686.patch
Optimize UpdateDb to load less field from Store
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1687:
Attachment: NUTCH-1687.patch
Pick queue in Round Robin
-
Nguyen Manh Tien created NUTCH-1688:
---
Summary: Port DeleteDuplicate based on crawlDB only to 2.x
Key: NUTCH-1688
URL: https://issues.apache.org/jira/browse/NUTCH-1688
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1688:
Component/s: indexer
Port DeleteDuplicate based on crawlDB only to 2.x
[
https://issues.apache.org/jira/browse/NUTCH-1688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1688:
Attachment: NUTCH-1688.patch
Port DeleteDuplicate based on crawlDB only to 2.x
[
https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1689:
Attachment: NUTCH-1689.patch
Improve CrawlDb stats
-
Nguyen Manh Tien created NUTCH-1689:
---
Summary: Improve CrawlDb stats
Key: NUTCH-1689
URL: https://issues.apache.org/jira/browse/NUTCH-1689
Project: Nutch
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1689:
Fix Version/s: 2.3
Improve CrawlDb stats
-
Key:
Nguyen Manh Tien created NUTCH-1690:
---
Summary: IndexClean: mark url as unindexed after clean to not
delete again
Key: NUTCH-1690
URL: https://issues.apache.org/jira/browse/NUTCH-1690
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1690:
Fix Version/s: 2.3
IndexClean: mark url as unindexed after clean to not delete again
[
https://issues.apache.org/jira/browse/NUTCH-1686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855385#comment-13855385
]
Nguyen Manh Tien commented on NUTCH-1686:
-
no backwards compatibility, because i
[
https://issues.apache.org/jira/browse/NUTCH-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13855386#comment-13855386
]
Nguyen Manh Tien commented on NUTCH-1687:
-
I found one in double linked list
[
https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1689:
Attachment: (was: NUTCH-1690.patch)
Improve CrawlDb stats
-
[
https://issues.apache.org/jira/browse/NUTCH-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1689:
Attachment: NUTCH-1690.patch
Thanks Tejas for reviewing
1)2) I think my change don't
[
https://issues.apache.org/jira/browse/NUTCH-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13854780#comment-13854780
]
Nguyen Manh Tien commented on NUTCH-1314:
-
[~lewismc] We are using
Nguyen Manh Tien created NUTCH-1682:
---
Summary: Port optionally maintain custom fetch interval despite
AdaptiveFetchSchedule to 2.x
Key: NUTCH-1682
URL: https://issues.apache.org/jira/browse/NUTCH-1682
[
https://issues.apache.org/jira/browse/NUTCH-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1682:
Attachment: NUTCH-1682.patch
Port optionally maintain custom fetch interval despite
[
https://issues.apache.org/jira/browse/NUTCH-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1682:
Affects Version/s: (was: 2.3)
Port optionally maintain custom fetch interval despite
[
https://issues.apache.org/jira/browse/NUTCH-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1682:
Fix Version/s: 2.3
Port optionally maintain custom fetch interval despite
Nguyen Manh Tien created NUTCH-1683:
---
Summary: Optionally maintain custom fetch interval despite
AbstractFetchSchedule
Key: NUTCH-1683
URL: https://issues.apache.org/jira/browse/NUTCH-1683
Project:
[
https://issues.apache.org/jira/browse/NUTCH-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1683:
Description: DefaultFetchSchedul also change fetch interval so we should
also maintain
[
https://issues.apache.org/jira/browse/NUTCH-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1683:
Attachment: (was: NUTCH-1683.patch)
Optionally maintain custom fetch interval despite
[
https://issues.apache.org/jira/browse/NUTCH-1683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1683:
Attachment: NUTCH-1683.patch
Optionally maintain custom fetch interval despite
Nguyen Manh Tien created NUTCH-1679:
---
Summary: UpdateDb using batchId, link may override crawled page.
Key: NUTCH-1679
URL: https://issues.apache.org/jira/browse/NUTCH-1679
Project: Nutch
[
https://issues.apache.org/jira/browse/NUTCH-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1672:
Attachment: NUTCH-1672.patch
Inlinks are added twice in DbUpdateReducer
Nguyen Manh Tien created NUTCH-1672:
---
Summary: Inlinks are added twice in DbUpdateReducer
Key: NUTCH-1672
URL: https://issues.apache.org/jira/browse/NUTCH-1672
Project: Nutch
Issue Type:
Nguyen Manh Tien created NUTCH-1673:
---
Summary: Title isn't reset in MoreIndexingFilter
Key: NUTCH-1673
URL: https://issues.apache.org/jira/browse/NUTCH-1673
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1673:
Attachment: NUTCH-1673.patch
Title isn't reset in MoreIndexingFilter
Nguyen Manh Tien created NUTCH-1674:
---
Summary: Use batchId filter enable scan (GORA-119) for
Fetch,Parse,Update,Index
Key: NUTCH-1674
URL: https://issues.apache.org/jira/browse/NUTCH-1674
Project:
Nguyen Manh Tien created NUTCH-1667:
---
Summary: Updatedb always ignore batchId
Key: NUTCH-1667
URL: https://issues.apache.org/jira/browse/NUTCH-1667
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1667:
Attachment: NUTCH-1556-batchId.patch
Updatedb always ignore batchId
[
https://issues.apache.org/jira/browse/NUTCH-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nguyen Manh Tien updated NUTCH-1556:
Attachment: NUTCH-1556-batchId.patch
batchId is not set in currentJob because we set
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13788911#comment-13788911
]
Nguyen Manh Tien commented on NUTCH-961:
I used patch NUTCH-961-2.1-v2.patch for
33 matches
Mail list logo