+1
-Original message-
From:Talat Uyarer ta...@uyarer.com
Sent: Wednesday 11th March 2015 13:45
To: ment...@community.apache.org; dev@nutch.apache.org
Subject: Google Summer of Code 2015 Mentor Registration
Nutch PMC,
Please acknowledge my request to become a mentor for Google
Hi,
My name is Mohit Bagde. I am currently doing my Master's in CS at USC. I
have taken CS572 Information Retrieval and Search Engines under Prof.
Mattmann and as have worked on Nutch 1.X as part of the first assignment
which involved crawling with Nutch and integrating with Tika and
subsequently
Markus Jelsma created NUTCH-1958:
Summary: Remove scoring-opic from nutch-default.xml
Key: NUTCH-1958
URL: https://issues.apache.org/jira/browse/NUTCH-1958
Project: Nutch
Issue Type:
[
https://issues.apache.org/jira/browse/NUTCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358349#comment-14358349
]
Sebastian Nagel commented on NUTCH-1956:
+1
Members to be public in
GitHub user renxiawang opened a pull request:
https://github.com/apache/nutch/pull/12
NUTCH-1957 using MD5 as part of file path to solve filename collision issue
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/renxiawang/nutch
[
https://issues.apache.org/jira/browse/NUTCH-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358439#comment-14358439
]
Renxia Wang commented on NUTCH-1957:
Hi Sebastian,
Thank you for your suggestions.
Hi Tizy,
this should help:
https://wiki.apache.org/nutch/HttpPostAuthentication
http://svn.apache.org/repos/asf/nutch/trunk/conf/httpclient-auth.xml.template
For more details you could also check
https://issues.apache.org/jira/browse/NUTCH-827
https://issues.apache.org/jira/browse/NUTCH-1943
[
https://issues.apache.org/jira/browse/NUTCH-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Luis Betancourt Gonzalez updated NUTCH-1962:
--
Attachment: NUTCH-1962.patch
Need to have mimetype-filter.txt
[
https://issues.apache.org/jira/browse/NUTCH-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359894#comment-14359894
]
Lewis John McGibbney commented on NUTCH-1962:
-
+1 commit thanks Jorge
On
[
https://issues.apache.org/jira/browse/NUTCH-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359931#comment-14359931
]
Jorge Luis Betancourt Gonzalez commented on NUTCH-1962:
---
Committed
[
https://issues.apache.org/jira/browse/NUTCH-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359935#comment-14359935
]
Hudson commented on NUTCH-1962:
---
SUCCESS: Integrated in Nutch-trunk #3012 (See
Hi Lewis,
Thank you for the reply.
I tried by providing the parameters specified in the httpclient-auth.xml
template file. But while crawling I am getting the following warnings.
WARN httpclient.Http: Bad auth conf file: root element credentials found
in httpclient-auth.xml - must be
Lewis John McGibbney created NUTCH-1963:
---
Summary: CommonsCrawlDataDumper is too long ( 100 bytes) when
-gzip option invoked
Key: NUTCH-1963
URL: https://issues.apache.org/jira/browse/NUTCH-1963
[
https://issues.apache.org/jira/browse/NUTCH-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney updated NUTCH-1959:
Attachment: NUTCH-1959.v02.patch
Giuseppe's patch
Improving CommonCrawlFormat
[
https://issues.apache.org/jira/browse/NUTCH-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359127#comment-14359127
]
Giuseppe Totaro commented on NUTCH-1963:
Thanks a lot [~lewismc]. We can solve
[
https://issues.apache.org/jira/browse/NUTCH-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Renxia Wang updated NUTCH-1957:
---
Attachment: NUTCH-1957.patch
FileDumper output file name collisions
[
https://issues.apache.org/jira/browse/NUTCH-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Renxia Wang updated NUTCH-1957:
---
Patch Info: Patch Available
FileDumper output file name collisions
Dear Wiki user,
You have subscribed to a wiki page or wiki category on Nutch Wiki for change
notification.
The Nutch_1.X_RESTAPI page has been changed by SujenShah:
https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI?action=diffrev1=2rev2=3
Ok
+ === Configuration ===
+
Hi,
Is there any detailed step by step explanation on how to implement
HTTPPostAuthentication on Nutch 1.10.?
Thanks and Regards,
Tizy
[
https://issues.apache.org/jira/browse/NUTCH-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris A. Mattmann reassigned NUTCH-1960:
Assignee: Chris A. Mattmann
JUnit test for dump method of CommonCrawlDataDumper
[
https://issues.apache.org/jira/browse/NUTCH-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-1960 started by Chris A. Mattmann.
JUnit test for dump method of CommonCrawlDataDumper
[
https://issues.apache.org/jira/browse/NUTCH-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Work on NUTCH-1959 started by Chris A. Mattmann.
Improving CommonCrawlFormat implementations
22 matches
Mail list logo