Why URLNormalizer doesn't implement the Pluggable?

2011-08-26 Thread Kaiwii Ho
I'm a freshman learning about the nutch. Here,I have serval questions: 1、URLNormalizer is a kind of a ExtensionPoint.But why does it implement the Pluggable as other extensionpoint does?And further-more,do any difference exist between the URLNormalizer and the other ExtensionPoint leading the

Are there any tutorial for writing regex-normalize.xml?

2011-08-26 Thread Kaiwii Ho
I'm gonna to specify my own regex-normalize.xml.Are there any tutorial for writing regex-normalize.xml? waiting for ur help and thank u

[jira] [Commented] (NUTCH-990) protocol-httpclient fails with short pages

2011-08-26 Thread Stephan Grotz (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091682#comment-13091682 ] Stephan Grotz commented on NUTCH-990: - Same here - been trying to fetch https pages

[jira] [Reopened] (NUTCH-990) protocol-httpclient fails with short pages

2011-08-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche reopened NUTCH-990: - protocol-httpclient fails with short pages --

Re: Why URLNormalizer doesn't implement the Pluggable?

2011-08-26 Thread Julien Nioche
Resending your messages every hour won't get you more answers - at the opposite On 26 August 2011 09:28, Kaiwii Ho kaiwi...@gmail.com wrote: I'm a freshman learning about the nutch. Here,I have serval questions: 1、URLNormalizer is a kind of a ExtensionPoint.But why does it implement the

[jira] [Resolved] (NUTCH-990) protocol-httpclient fails with short pages

2011-08-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche resolved NUTCH-990. - Resolution: Fixed Fix Version/s: (was: 1.3) 1.4 A patch has been

[jira] [Commented] (NUTCH-937) When nutch is run on hadoop 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-08-26 Thread Radim Kolar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091740#comment-13091740 ] Radim Kolar commented on NUTCH-937: --- we should stick with hadoop 0.20.203.0 not CDH and

[jira] [Commented] (NUTCH-937) When nutch is run on hadoop 0.20.2 (or cdh) it will not find plugins because MapReduce will not unpack plugin/ directory from the job's pack (due to MAPREDUCE-967)

2011-08-26 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091747#comment-13091747 ] Julien Nioche commented on NUTCH-937: - @Radim : Nutch is based on the Apache

[no subject]

2011-08-26 Thread gaurav bagga

[Nutch Wiki] Trivial Update of FrontPage by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The FrontPage page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FrontPage?action=diffrev1=222rev2=223 * StrategicGoals * IndexStructure *

[Nutch Wiki] Trivial Update of Archive and Legacy by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The Archive and Legacy page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/Archive%20and%20Legacy?action=diffrev1=15rev2=16 === Development and Old Nutch 2.0 ===

[Nutch Wiki] Trivial Update of MapReduce by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The MapReduce page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/MapReduce?action=diffrev1=7rev2=8 + = How Map and Reduce operations are actually carried out = + ==

[Nutch Wiki] Trivial Update of IndexStructure by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The IndexStructure page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/IndexStructure?action=diffrev1=3rev2=4 ||type|| NO || UnTokenized

[jira] [Commented] (NUTCH-386) Plugin to index categories by url rules

2011-08-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13091846#comment-13091846 ] Lewis John McGibbney commented on NUTCH-386: What is the position with this

[Nutch Wiki] Trivial Update of IndexStructure by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The IndexStructure page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/IndexStructure?action=diffrev1=4rev2=5 ||lang|| YES || UnTokenized

[Nutch Wiki] Trivial Update of IndexStructure by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The IndexStructure page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/IndexStructure?action=diffrev1=5rev2=6 ||segment || YES ||

[Nutch Wiki] Trivial Update of IndexStructure by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The IndexStructure page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/IndexStructure?action=diffrev1=6rev2=7 ||lang|| YES || UnTokenized

[Nutch Wiki] Trivial Update of IndexStructure by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The IndexStructure page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/IndexStructure?action=diffrev1=7rev2=8 The index structure formed after indexing is shown

[Nutch Wiki] Trivial Update of FrontPage by LewisJohnMcgibbney

2011-08-26 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The FrontPage page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/FrontPage?action=diffrev1=224rev2=225 * MultiLingualSupport - ''In development''. *

Build failed in Jenkins: Nutch-trunk #1586

2011-08-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/Nutch-trunk/1586/ -- [...truncated 986 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java A