[jira] [Created] (NUTCH-1955) ByteWritable missing in NutchWritable

2015-03-10 Thread Markus Jelsma (JIRA)
Markus Jelsma created NUTCH-1955: Summary: ByteWritable missing in NutchWritable Key: NUTCH-1955 URL: https://issues.apache.org/jira/browse/NUTCH-1955 Project: Nutch Issue Type: Task

[jira] [Updated] (NUTCH-1955) ByteWritable missing in NutchWritable

2015-03-10 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1955: - Attachment: NUTCH-1955.patch Patch for trnk ByteWritable missing in NutchWritable

Re: GSOC 2015, Introduction and Project of Interest

2015-03-10 Thread Lewis John Mcgibbney
Good Afternoon Ashwini, You can find out information about the project at the Nutch project wiki, which is here - https://wiki.apache.org/nutch/GoogleSummerOfCode#NUTCH-1936_GSoC_2015_-_Move_Nutch_to_Hadoop_2.X We are looking for students to provide input to their project proposals based on the

[jira] [Issue Comment Deleted] (NUTCH-1936) GSoC 2015 - Move Nutch to Hadoop 2.X

2015-03-10 Thread Ashwini Tokekar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashwini Tokekar updated NUTCH-1936: --- Comment: was deleted (was: Thanks Lewis) GSoC 2015 - Move Nutch to Hadoop 2.X

[jira] [Commented] (NUTCH-1936) GSoC 2015 - Move Nutch to Hadoop 2.X

2015-03-10 Thread Ashwini Tokekar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356294#comment-14356294 ] Ashwini Tokekar commented on NUTCH-1936: Thanks Lewis GSoC 2015 - Move Nutch to

[jira] [Commented] (NUTCH-1936) GSoC 2015 - Move Nutch to Hadoop 2.X

2015-03-10 Thread Ashwini Tokekar (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14356295#comment-14356295 ] Ashwini Tokekar commented on NUTCH-1936: Thanks Lewis GSoC 2015 - Move Nutch to

Re: GSOC 2015, Introduction and Project of Interest

2015-03-10 Thread Lewis John Mcgibbney
Great thanks. I'll add you to the wiki tomorrow. Best Lewis On Tuesday, March 10, 2015, ASHWINI TOKEKAR tokekar.ashw...@gmail.com wrote: Thanks, Lewis for your prompt reply. My wiki username is : ashwinitokekar. I will send you a project proposal in the format mentioned by you by 15th March.

[jira] [Updated] (NUTCH-1956) Members to be public in URLCrawlDatum

2015-03-10 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-1956: - Attachment: NUTCH-1956.patch Patch for trunk. Members to be public in URLCrawlDatum

[Nutch Wiki] New attachment added to page GiuseppeTotaro

2015-03-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page GiuseppeTotaro for change notification. An attachment has been added to that page by GiuseppeTotaro. Following detailed information is available: Attachment name: CommonCrawlDataDumper_v02.pdf Attachment size: 99670 Attachment link:

[Nutch Wiki] New attachment added to page CommonCrawlDataDumper

2015-03-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page CommonCrawlDataDumper for change notification. An attachment has been added to that page by GiuseppeTotaro. Following detailed information is available: Attachment name: CommonCrawlDataDumper_v02.png Attachment size: 771605 Attachment link:

[Nutch Wiki] New attachment added to page CommonCrawlDataDumper

2015-03-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page CommonCrawlDataDumper for change notification. An attachment has been added to that page by GiuseppeTotaro. Following detailed information is available: Attachment name: CommonCrawlDataDumper_v02.png Attachment size: 325140 Attachment link:

[Nutch Wiki] New attachment added to page GiuseppeTotaro

2015-03-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page GiuseppeTotaro for change notification. An attachment has been added to that page by GiuseppeTotaro. Following detailed information is available: Attachment name: CommonCrawlDataDumper_v02.png Attachment size: 771605 Attachment link:

[Nutch Wiki] New attachment added to page CommonCrawlDataDumper

2015-03-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page CommonCrawlDataDumper for change notification. An attachment has been added to that page by GiuseppeTotaro. Following detailed information is available: Attachment name: CommonCrawlDataDumper_v02.png Attachment size: 234312 Attachment link:

[Nutch Wiki] Update of CommonCrawlDataDumper by GiuseppeTotaro

2015-03-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CommonCrawlDataDumper page has been changed by GiuseppeTotaro: https://wiki.apache.org/nutch/CommonCrawlDataDumper?action=diffrev1=1rev2=2 - The CommonCrawlDataDumper is a Nutch tool

[Nutch Wiki] Trivial Update of ContributorsGroup by LewisJohnMcgibbney

2015-03-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The ContributorsGroup page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/ContributorsGroup?action=diffrev1=23rev2=24 * JayavanthShenoy * GiuseppeTotaro *

[jira] [Commented] (NUTCH-1936) GSoC 2015 - Move Nutch to Hadoop 2.X

2015-03-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355939#comment-14355939 ] Lewis John McGibbney commented on NUTCH-1936: - Hi [~ashwini.tokekar] bq. I

Re: title inside body problem

2015-03-10 Thread Lewis John Mcgibbney
Hi Zein, On Mon, Mar 9, 2015 at 4:53 PM, dev-digest-h...@nutch.apache.org wrote: I am using nutch 2.3 and faced a problem with some arabic content sites this url displays the title by a tag in the body and getTitle code will stop after /head and consider that there is no title I thought

[jira] [Commented] (NUTCH-1948) Make the Selenium remote web driver specification, configuration and selection available via a Factory-type mechanism

2015-03-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355926#comment-14355926 ] Lewis John McGibbney commented on NUTCH-1948: - bq. Would something like

[jira] [Commented] (NUTCH-1936) GSoC 2015 - Move Nutch to Hadoop 2.X

2015-03-10 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355931#comment-14355931 ] Lewis John McGibbney commented on NUTCH-1936: - [~petr.shypila], you know can

[Nutch Wiki] Trivial Update of CommandLineOptions by LewisJohnMcgibbney

2015-03-10 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The CommandLineOptions page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/CommandLineOptions?action=diffrev1=59rev2=60 ||[[bin/nutch nutchserver]]||run a (local)