[jira] [Commented] (NUTCH-1699) Tika Parser - Image Parse Bug

2014-01-13 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870479#comment-13870479 ] Mehmet Zahid Yüzügüldü commented on NUTCH-1699: --- During content parse of mim

[jira] [Updated] (NUTCH-1699) Tika Parser - Image Parse Bug

2014-01-13 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehmet Zahid Yüzügüldü updated NUTCH-1699: -- Attachment: NUTCH_1699.patch > Tika Parser - Image Parse Bug >

[jira] [Created] (NUTCH-1699) Tika Parser - Image Parse Bug

2014-01-13 Thread JIRA
Mehmet Zahid Yüzügüldü created NUTCH-1699: - Summary: Tika Parser - Image Parse Bug Key: NUTCH-1699 URL: https://issues.apache.org/jira/browse/NUTCH-1699 Project: Nutch Issue Type: Bug

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-13 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13870141#comment-13870141 ] Sebastian Nagel commented on NUTCH-1113: Hi [~markus17], you are right: my patch f

[jira] [Updated] (NUTCH-1568) port pluggable indexing architecture to 2.x

2014-01-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1568: Attachment: NUTCH-1568-v4.patch Patch for 2.x HEAD. [~talat], changes from last pat

[jira] [Comment Edited] (NUTCH-1568) port pluggable indexing architecture to 2.x

2014-01-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869818#comment-13869818 ] Lewis John McGibbney edited comment on NUTCH-1568 at 1/13/14 6:46 PM: --

[jira] [Created] (NUTCH-1698) crawl script should not specify solrUrl to accommodate pluggable indexing architecture

2014-01-13 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1698: --- Summary: crawl script should not specify solrUrl to accommodate pluggable indexing architecture Key: NUTCH-1698 URL: https://issues.apache.org/jira/browse/NUTCH-16

[jira] [Assigned] (NUTCH-1698) crawl script should not specify solrUrl to accommodate pluggable indexing architecture

2014-01-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney reassigned NUTCH-1698: --- Assignee: Lewis John McGibbney > crawl script should not specify solrUrl to a

[jira] [Commented] (NUTCH-1568) port pluggable indexing architecture to 2.x

2014-01-13 Thread Talat UYARER (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869550#comment-13869550 ] Talat UYARER commented on NUTCH-1568: - This patch contains pluggable IndexJob and Solr

[jira] [Commented] (NUTCH-1568) port pluggable indexing architecture to 2.x

2014-01-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869545#comment-13869545 ] Lewis John McGibbney commented on NUTCH-1568: - The most recent patch seems to

[jira] [Commented] (NUTCH-1672) Inlinks are added twice in DbUpdateReducer

2014-01-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869543#comment-13869543 ] Hudson commented on NUTCH-1672: --- SUCCESS: Integrated in Nutch-nutchgora #884 (See [https://

[jira] [Commented] (NUTCH-1667) Updatedb always ignore batchId

2014-01-13 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869542#comment-13869542 ] Hudson commented on NUTCH-1667: --- SUCCESS: Integrated in Nutch-nutchgora #884 (See [https://

[jira] [Resolved] (NUTCH-1672) Inlinks are added twice in DbUpdateReducer

2014-01-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1672. - Resolution: Fixed Committed @revision 1557707 in 2.x HEAD. Thank you [~tiennm] fo

[jira] [Updated] (NUTCH-1655) Indexer Plugin for Elastic Search

2014-01-13 Thread Talat UYARER (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Talat UYARER updated NUTCH-1655: Attachment: NUTCH-1655-v2.path Updated for no-prefix. > Indexer Plugin for Elastic Search > -

[jira] [Updated] (NUTCH-1568) port pluggable indexing architecture to 2.x

2014-01-13 Thread Talat UYARER (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Talat UYARER updated NUTCH-1568: Attachment: NUTCH-1568-v3.path hi [~lewismc], I create with no prefix. Sorry for duplicate work.

[jira] [Resolved] (NUTCH-1667) Updatedb always ignore batchId

2014-01-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1667. - Resolution: Fixed Committed @revision 1557705 in 2.x HEAD. Thank you [~tiennm] >

[jira] [Commented] (NUTCH-1568) port pluggable indexing architecture to 2.x

2014-01-13 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869515#comment-13869515 ] Lewis John McGibbney commented on NUTCH-1568: - Hi Talat, can you please genera

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869441#comment-13869441 ] Markus Jelsma commented on NUTCH-1113: -- Another record is also missing {code} Segmen

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869439#comment-13869439 ] Markus Jelsma commented on NUTCH-1113: -- Sebastian's patch does solve a few problems i

[jira] [Comment Edited] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869439#comment-13869439 ] Markus Jelsma edited comment on NUTCH-1113 at 1/13/14 11:54 AM:

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869399#comment-13869399 ] Markus Jelsma commented on NUTCH-1113: -- Ignoring LINKED completely means around line