Hi all,
Now used parser plugins nekohtml doesnt parse correctly. When I tested
in huge website site, it leaves html tags. IMHO our parser is little
bit old. After doing some research, I found Jsoup[1] and Gumbo[2]
parser. I did some test on broken websites. I saw gumbo and jsoup
parsed very simil
Hi all,
A long time ago, we talked with Julien and Lewis about major needs for 2.x
on the mail list.
I know that Giraph uses only map slots as workers. At the present our
architecture of scoring plugins don't permit. Giraph and Opic have
different work types. IMHO We should create a pluggable Ran
Hi all,
A long time ago, we talk with Julien and Lewis about major needs for 2.x on
the maillist.
As far as I know Giraph use only map slots as works. At the present our
architecture of scoring plugins dont permit. IMHO We should create a
pluggable RankingJob like as IndexingJob. The Pluggable ar
[
https://issues.apache.org/jira/browse/NUTCH-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987861#comment-13987861
]
Rogério Pereira Araújo commented on NUTCH-1768:
---
Yes, sure, set to correct n
[
https://issues.apache.org/jira/browse/NUTCH-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987809#comment-13987809
]
Julien Nioche commented on NUTCH-1768:
--
Do you set the cluster name in the config (el
[
https://issues.apache.org/jira/browse/NUTCH-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987782#comment-13987782
]
Julien Nioche commented on NUTCH-1674:
--
OK, so it *does* filter based on the Mark, wh
[
https://issues.apache.org/jira/browse/NUTCH-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987757#comment-13987757
]
Rogério Pereira Araújo edited comment on NUTCH-1768 at 5/2/14 3:01 PM:
-
[
https://issues.apache.org/jira/browse/NUTCH-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987758#comment-13987758
]
Alparslan Avcı commented on NUTCH-1674:
---
Hi [~jnioche],
You are right; the filter i
[
https://issues.apache.org/jira/browse/NUTCH-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987757#comment-13987757
]
Rogério Pereira Araújo edited comment on NUTCH-1768 at 5/2/14 3:01 PM:
-
[
https://issues.apache.org/jira/browse/NUTCH-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987757#comment-13987757
]
Rogério Pereira Araújo commented on NUTCH-1768:
---
Julien,
When I set the hos
[
https://issues.apache.org/jira/browse/NUTCH-1741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1741:
-
Fix Version/s: (was: 2.3)
2.4
> Support of Sitemaps in Nutch 2.x
> ---
[
https://issues.apache.org/jira/browse/NUTCH-1674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987605#comment-13987605
]
Julien Nioche commented on NUTCH-1674:
--
Hi,
I haven't played with the filtering in G
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987602#comment-13987602
]
Julien Nioche commented on NUTCH-1714:
--
bq. I do not know if you have tested the patc
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987551#comment-13987551
]
Alparslan Avcı commented on NUTCH-1714:
---
Hi [~jnioche],
I do not know if you have t
[
https://issues.apache.org/jira/browse/NUTCH-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Talat UYARER resolved NUTCH-1728.
-
Resolution: Fixed
Committed revision 1591849.
> indexer-solr plugin is not delete docs from solr
[
https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987520#comment-13987520
]
Julien Nioche commented on NUTCH-1622:
--
Hi Daniel
Sorry for not commenting on your p
[
https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-1622:
Assignee: Julien Nioche
> Create Outlinks with metadata
> -
>
>
[
https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1622:
-
Fix Version/s: 2.4
> Create Outlinks with metadata
> -
>
>
[
https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1622:
-
Fix Version/s: (was: 1.9)
1.8
> Create Outlinks with metadata
> --
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987496#comment-13987496
]
Julien Nioche commented on NUTCH-1714:
--
Hi [~alparslan.avci]
It does not fix the is
[
https://issues.apache.org/jira/browse/NUTCH-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Talat UYARER resolved NUTCH-1753.
-
Resolution: Fixed
Fix Version/s: (was: 2.4)
2.3
Committed revision
Ivan Vershinin created NUTCH-1769:
-
Summary: API refactoring
Key: NUTCH-1769
URL: https://issues.apache.org/jira/browse/NUTCH-1769
Project: Nutch
Issue Type: Improvement
Components:
[
https://issues.apache.org/jira/browse/NUTCH-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987476#comment-13987476
]
Julien Nioche commented on NUTCH-1753:
--
then please mark it as resolved and while doi
[
https://issues.apache.org/jira/browse/NUTCH-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987475#comment-13987475
]
Julien Nioche commented on NUTCH-1768:
--
hi Rogerio
Patches are applied against the t
[
https://issues.apache.org/jira/browse/NUTCH-1753?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13987473#comment-13987473
]
Talat UYARER commented on NUTCH-1753:
-
I committed this issue.
> Eclipse dependecy p
[
https://issues.apache.org/jira/browse/NUTCH-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Talat UYARER updated NUTCH-1662:
Patch Info: Patch Available
> Indexer Plugin for Solr Cloud
> -
>
>
26 matches
Mail list logo