RE: Google Summer of Code 2015 Mentor Registration

2015-03-12 Thread Markus Jelsma
+1 -Original message- From:Talat Uyarer ta...@uyarer.com Sent: Wednesday 11th March 2015 13:45 To: ment...@community.apache.org; dev@nutch.apache.org Subject: Google Summer of Code 2015 Mentor Registration Nutch PMC, Please acknowledge my request to become a mentor for Google

Re: [jira] [Issue Comment Deleted] (NUTCH-1936) GSoC 2015 - Move Nutch to Hadoop 2.X

2015-03-12 Thread Mohit Bagde
Hi, My name is Mohit Bagde. I am currently doing my Master's in CS at USC. I have taken CS572 Information Retrieval and Search Engines under Prof. Mattmann and as have worked on Nutch 1.X as part of the first assignment which involved crawling with Nutch and integrating with Tika and subsequently

[jira] [Created] (NUTCH-1958) Remove scoring-opic from nutch-default.xml

2015-03-12 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-1958: Summary: Remove scoring-opic from nutch-default.xml Key: NUTCH-1958 URL: https://issues.apache.org/jira/browse/NUTCH-1958 Project: Nutch Issue Type:

[jira] [Commented] (NUTCH-1956) Members to be public in URLCrawlDatum

2015-03-12 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358349#comment-14358349 ] Sebastian Nagel commented on NUTCH-1956: +1 Members to be public in

[GitHub] nutch pull request: NUTCH-1957 using MD5 as part of file path to s...

2015-03-12 Thread renxiawang
GitHub user renxiawang opened a pull request: https://github.com/apache/nutch/pull/12 NUTCH-1957 using MD5 as part of file path to solve filename collision issue You can merge this pull request into a Git repository by running: $ git pull https://github.com/renxiawang/nutch

[jira] [Commented] (NUTCH-1957) FileDumper output file name collisions

2015-03-12 Thread Renxia Wang (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358439#comment-14358439 ] Renxia Wang commented on NUTCH-1957: Hi Sebastian, Thank you for your suggestions.

Re: HTTP Post Authentication

2015-03-12 Thread Sebastian Nagel
Hi Tizy, this should help: https://wiki.apache.org/nutch/HttpPostAuthentication http://svn.apache.org/repos/asf/nutch/trunk/conf/httpclient-auth.xml.template For more details you could also check https://issues.apache.org/jira/browse/NUTCH-827 https://issues.apache.org/jira/browse/NUTCH-1943

[jira] [Updated] (NUTCH-1962) Need to have mimetype-filter.txt file available by default

2015-03-12 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Luis Betancourt Gonzalez updated NUTCH-1962: -- Attachment: NUTCH-1962.patch Need to have mimetype-filter.txt

[jira] [Commented] (NUTCH-1962) Need to have mimetype-filter.txt file available by default

2015-03-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359894#comment-14359894 ] Lewis John McGibbney commented on NUTCH-1962: - +1 commit thanks Jorge On

[jira] [Commented] (NUTCH-1962) Need to have mimetype-filter.txt file available by default

2015-03-12 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359931#comment-14359931 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-1962: --- Committed

[jira] [Commented] (NUTCH-1962) Need to have mimetype-filter.txt file available by default

2015-03-12 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359935#comment-14359935 ] Hudson commented on NUTCH-1962: --- SUCCESS: Integrated in Nutch-trunk #3012 (See

Re: HTTP Post Authentication

2015-03-12 Thread Tizy Ninan
Hi Lewis, Thank you for the reply. I tried by providing the parameters specified in the httpclient-auth.xml template file. But while crawling I am getting the following warnings. WARN httpclient.Http: Bad auth conf file: root element credentials found in httpclient-auth.xml - must be

[jira] [Created] (NUTCH-1963) CommonsCrawlDataDumper is too long ( 100 bytes) when -gzip option invoked

2015-03-12 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1963: --- Summary: CommonsCrawlDataDumper is too long ( 100 bytes) when -gzip option invoked Key: NUTCH-1963 URL: https://issues.apache.org/jira/browse/NUTCH-1963

[jira] [Updated] (NUTCH-1959) Improving CommonCrawlFormat implementations

2015-03-12 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1959: Attachment: NUTCH-1959.v02.patch Giuseppe's patch Improving CommonCrawlFormat

[jira] [Commented] (NUTCH-1963) CommonsCrawlDataDumper is too long ( 100 bytes) when -gzip option invoked

2015-03-12 Thread Giuseppe Totaro (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359127#comment-14359127 ] Giuseppe Totaro commented on NUTCH-1963: Thanks a lot [~lewismc]. We can solve

[jira] [Updated] (NUTCH-1957) FileDumper output file name collisions

2015-03-12 Thread Renxia Wang (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renxia Wang updated NUTCH-1957: --- Attachment: NUTCH-1957.patch FileDumper output file name collisions

[jira] [Updated] (NUTCH-1957) FileDumper output file name collisions

2015-03-12 Thread Renxia Wang (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Renxia Wang updated NUTCH-1957: --- Patch Info: Patch Available FileDumper output file name collisions

[Nutch Wiki] Update of Nutch_1.X_RESTAPI by SujenShah

2015-03-12 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The Nutch_1.X_RESTAPI page has been changed by SujenShah: https://wiki.apache.org/nutch/Nutch_1.X_RESTAPI?action=diffrev1=2rev2=3 Ok + === Configuration === +

HTTP Post Authentication

2015-03-12 Thread Tizy Ninan
Hi, Is there any detailed step by step explanation on how to implement HTTPPostAuthentication on Nutch 1.10.? Thanks and Regards, Tizy

[jira] [Assigned] (NUTCH-1960) JUnit test for dump method of CommonCrawlDataDumper

2015-03-12 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris A. Mattmann reassigned NUTCH-1960: Assignee: Chris A. Mattmann JUnit test for dump method of CommonCrawlDataDumper

[jira] [Work started] (NUTCH-1960) JUnit test for dump method of CommonCrawlDataDumper

2015-03-12 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1960 started by Chris A. Mattmann. JUnit test for dump method of CommonCrawlDataDumper

[jira] [Work started] (NUTCH-1959) Improving CommonCrawlFormat implementations

2015-03-12 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on NUTCH-1959 started by Chris A. Mattmann. Improving CommonCrawlFormat implementations