[
https://issues.apache.org/jira/browse/NUTCH-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1707:
-
Attachment: NUTCH-1707-trunk.patch
Updated patch to use URL field.
DummyIndexingWriter
[
https://issues.apache.org/jira/browse/NUTCH-1697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876278#comment-13876278
]
Markus Jelsma commented on NUTCH-1697:
--
Hi Tejas, i think it does not matter, most of
[
https://issues.apache.org/jira/browse/NUTCH-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1680.
--
Resolution: Fixed
Thanks! Committed to trunk in revision 1559657.
CrawldbReader to dump
[
https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876283#comment-13876283
]
Markus Jelsma commented on NUTCH-1113:
--
Yes, i reindexed them segment for segment.
[
https://issues.apache.org/jira/browse/NUTCH-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876288#comment-13876288
]
Markus Jelsma commented on NUTCH-1708:
--
Hi Sebastian - we've had issues with that
[
https://issues.apache.org/jira/browse/NUTCH-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876293#comment-13876293
]
Hudson commented on NUTCH-1680:
---
SUCCESS: Integrated in Nutch-trunk #2498 (See
[
https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876304#comment-13876304
]
Markus Jelsma commented on NUTCH-1113:
--
Ok, i got something! A record that wasn't
[
https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876384#comment-13876384
]
Markus Jelsma commented on NUTCH-1113:
--
I got less documents indexed when ignoring
[
https://issues.apache.org/jira/browse/NUTCH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876426#comment-13876426
]
Markus Jelsma commented on NUTCH-1113:
--
I have to reindex my control cluster segment
[
https://issues.apache.org/jira/browse/NUTCH-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13876432#comment-13876432
]
Alexander Uretsky commented on NUTCH-1674:
--
Tried this out and it works great!
I'm working on porting NUTCH-1622 to Nutch 2 and the path I took was to add
a MapWritable field to the Outlink class to hold the metadata.
In order to store the metadata in the WebPage so it will be passed along
the mappers and reducers I used the metadata field of the WebPage class.
Because the
11 matches
Mail list logo