[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047130#comment-13047130
]
Gabriele Kahlout commented on NUTCH-961:
{quote}it needs to use a different
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044456#comment-13044456
]
Gabriele Kahlout commented on NUTCH-995:
Sorry, to stick in the gullet but this
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044456#comment-13044456
]
Gabriele Kahlout edited comment on NUTCH-995 at 6/5/11 7:07 AM:
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044456#comment-13044456
]
Gabriele Kahlout edited comment on NUTCH-995 at 6/5/11 7:11 AM:
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044554#comment-13044554
]
Gabriele Kahlout commented on NUTCH-995:
I'm ! sure the excluded dependencies you
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044554#comment-13044554
]
Gabriele Kahlout edited comment on NUTCH-995 at 6/5/11 3:52 PM:
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044554#comment-13044554
]
Gabriele Kahlout edited comment on NUTCH-995 at 6/5/11 3:51 PM:
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044574#comment-13044574
]
Gabriele Kahlout commented on NUTCH-995:
BTW as Julien remarked earlier adding a
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044580#comment-13044580
]
Gabriele Kahlout commented on NUTCH-995:
{quote}Nutch is not Solr. It doesn't have
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044596#comment-13044596
]
Gabriele Kahlout commented on NUTCH-995:
{quote} What's stopping you from doing
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044349#comment-13044349
]
Gabriele Kahlout commented on NUTCH-995:
opps..I've issues getting proper diffs, I
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044360#comment-13044360
]
Gabriele Kahlout commented on NUTCH-995:
I'm actually still trying to build it w/o
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-961:
---
Attachment: (was: NUTCH-961-1.3-tikaparser1.patch)
Expose Tika's boilerpipe support
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-961:
---
Attachment: NUTCH-961v2.patch
Tested the patch against a checkout of 1.3 branch at revision
[
https://issues.apache.org/jira/browse/NUTCH-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-1001:
Attachment: (was: multipleSegs-fetch-parse.patch)
bin/nutch fetch/parse handle
[
https://issues.apache.org/jira/browse/NUTCH-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-1001:
Attachment: NUTCH-1001.patch
I'm having formatting snv-diff netbeans issues. This patch
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042063#comment-13042063
]
Gabriele Kahlout commented on NUTCH-995:
@Julien: for the second patch:
{quote}
$
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042063#comment-13042063
]
Gabriele Kahlout edited comment on NUTCH-995 at 6/1/11 9:47 AM:
[
https://issues.apache.org/jira/browse/NUTCH-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-1001:
Attachment: multipleSegs-fetch-parse.patch
This patch modifers Fetcher.java and
[
https://issues.apache.org/jira/browse/NUTCH-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042395#comment-13042395
]
Gabriele Kahlout edited comment on NUTCH-1001 at 6/1/11 8:10 PM:
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038463#comment-13038463
]
Gabriele Kahlout commented on NUTCH-995:
{code}
BUILD FAILED
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038463#comment-13038463
]
Gabriele Kahlout edited comment on NUTCH-995 at 5/24/11 9:04 AM:
[
https://issues.apache.org/jira/browse/NUTCH-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13038488#comment-13038488
]
Gabriele Kahlout commented on NUTCH-995:
the first patch worked for me.
Generate
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-961:
---
Attachment: NUTCH-961-1.3-tikaparser1.patch
Same as NUTCH-961-1.3-tikaparser.patch by Markus
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-961:
---
Attachment: NUTCH-961-1.3-tikaparser1.patch
Modified to include necessary changes to
[
https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027051#comment-13027051
]
Gabriele Kahlout commented on NUTCH-990:
@Julien - can we mark this related to the
[
https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027293#comment-13027293
]
Gabriele Kahlout commented on NUTCH-990:
@Julien - my bad with the pdfs.
[
https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-990:
---
Description:
Using protocol-http with a few words html pages works fine. But with
[
https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025946#comment-13025946
]
Gabriele Kahlout edited comment on NUTCH-990 at 4/27/11 6:46 PM:
[
https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025946#comment-13025946
]
Gabriele Kahlout edited comment on NUTCH-990 at 4/27/11 6:51 PM:
[
https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-990:
---
Description:
Using protocol-http with a few words html pages works fine. But with
[
https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025977#comment-13025977
]
Gabriele Kahlout commented on NUTCH-990:
the logs look full of INFO noise indeed. I
[
https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13025286#comment-13025286
]
Gabriele Kahlout commented on NUTCH-961:
@Markus - Thank you.
Watch out for [1] in
[
https://issues.apache.org/jira/browse/NUTCH-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13016465#comment-13016465
]
Gabriele Kahlout commented on NUTCH-967:
Julien, why doesn't your patch modify
Mergedb doesn't merge with empty directory, as is the case with merge (for
indexes)
---
Key: NUTCH-972
URL: https://issues.apache.org/jira/browse/NUTCH-972
Project:
[
https://issues.apache.org/jira/browse/NUTCH-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-972:
---
Attachment: check_empty.diff
Mergedb doesn't merge with empty directory, as is the case with
IndexMerger produces indexes itself cannot merge anymore
Key: NUTCH-971
URL: https://issues.apache.org/jira/browse/NUTCH-971
Project: Nutch
Issue Type: Bug
Components:
[
https://issues.apache.org/jira/browse/NUTCH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-971:
---
Attachment: IndexMerger-part.diff
Checks if the output index path ends with a part directory
[
https://issues.apache.org/jira/browse/NUTCH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-971:
---
Attachment: (was: IndexMerger-part.diff)
IndexMerger produces indexes itself cannot
[
https://issues.apache.org/jira/browse/NUTCH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-971:
---
Comment: was deleted
(was: Checks if the output index path ends with a part directory and if
[
https://issues.apache.org/jira/browse/NUTCH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gabriele Kahlout updated NUTCH-971:
---
Attachment: IndexMerger-part.diff
Checks if output path ends in a part dir and if not adds
[
https://issues.apache.org/jira/browse/NUTCH-971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13011627#comment-13011627
]
Gabriele Kahlout commented on NUTCH-971:
I expect that installing solr and then
42 matches
Mail list logo