[jira] [Updated] (NUTCH-1707) DummyIndexingWriter

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1707: - Attachment: NUTCH-1707-trunk.patch Updated patch to use URL field. DummyIndexingWriter

[jira] [Commented] (NUTCH-1697) SegmentMerger to implement Tool

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876278#comment-13876278 ] Markus Jelsma commented on NUTCH-1697: -- Hi Tejas, i think it does not matter, most of

[jira] [Resolved] (NUTCH-1680) CrawldbReader to dump minRetry value

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-1680. -- Resolution: Fixed Thanks! Committed to trunk in revision 1559657. CrawldbReader to dump

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876283#comment-13876283 ] Markus Jelsma commented on NUTCH-1113: -- Yes, i reindexed them segment for segment.

[jira] [Commented] (NUTCH-1708) use same id when indexing and deleting redirects

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876288#comment-13876288 ] Markus Jelsma commented on NUTCH-1708: -- Hi Sebastian - we've had issues with that

[jira] [Commented] (NUTCH-1680) CrawldbReader to dump minRetry value

2014-01-20 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876293#comment-13876293 ] Hudson commented on NUTCH-1680: --- SUCCESS: Integrated in Nutch-trunk #2498 (See

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876304#comment-13876304 ] Markus Jelsma commented on NUTCH-1113: -- Ok, i got something! A record that wasn't

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876384#comment-13876384 ] Markus Jelsma commented on NUTCH-1113: -- I got less documents indexed when ignoring

[jira] [Commented] (NUTCH-1113) Merging segments causes URLs to vanish from crawldb/index?

2014-01-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876426#comment-13876426 ] Markus Jelsma commented on NUTCH-1113: -- I have to reindex my control cluster segment

[jira] [Commented] (NUTCH-1674) Use batchId filter to enable scan (GORA-119) for Fetch,Parse,Update,Index

2014-01-20 Thread Alexander Uretsky (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876432#comment-13876432 ] Alexander Uretsky commented on NUTCH-1674: -- Tried this out and it works great!

What is the correct way to serialize a MapWritable to WebPage's metadata?

2014-01-20 Thread d_k
I'm working on porting NUTCH-1622 to Nutch 2 and the path I took was to add a MapWritable field to the Outlink class to hold the metadata. In order to store the metadata in the WebPage so it will be passed along the mappers and reducers I used the metadata field of the WebPage class. Because the