[jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6

2015-02-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339609#comment-14339609 ] Lewis John McGibbney commented on NUTCH-1946: - bq. Where can I see which tests

[no subject]

2015-02-26 Thread Jiangang Sun
unsubscribe

[no subject]

2015-02-26 Thread Kan Zhou
unsubscribe

unsuscribe

2015-02-26 Thread Jiangang Sun

[jira] [Comment Edited] (NUTCH-1946) Upgrade to Gora 0.6

2015-02-26 Thread Henry Saputra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339252#comment-14339252 ] Henry Saputra edited comment on NUTCH-1946 at 2/26/15 9:56 PM: -

[jira] [Comment Edited] (NUTCH-1946) Upgrade to Gora 0.6

2015-02-26 Thread Henry Saputra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339252#comment-14339252 ] Henry Saputra edited comment on NUTCH-1946 at 2/26/15 9:56 PM: -

[jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6

2015-02-26 Thread Henry Saputra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339252#comment-14339252 ] Henry Saputra commented on NUTCH-1946: -- Try to replicate the error stack but when I r

[jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6

2015-02-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339224#comment-14339224 ] Lewis John McGibbney commented on NUTCH-1946: - Grand > Upgrade to Gora 0.6 >

[jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6

2015-02-26 Thread Henry Saputra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339203#comment-14339203 ] Henry Saputra commented on NUTCH-1946: -- Ah thanks, it compiled now =) > Upgrade to G

Re: Unsubscribe

2015-02-26 Thread Julien Nioche
Massimo, http://nutch.apache.org/mailing_lists.html => dev-unsubscr...@nutch.apache.org Thanks On 26 February 2015 at 19:11, Massimo Miccoli wrote: > > > Massimo > > > Il giorno 26/feb/2015, alle ore 19:31, lewi...@apache.org ha scritto: > > > > Author: lewismc > > Date: Thu Feb 26 18:31:39 2

Re: MetaData fornear duplicates

2015-02-26 Thread Ami Akshay Parikh
Ya. I know about that. But I just thought that because Parse_Data already does that for us, I did not want to do tthe same processing again. I will try to figure something out. Thanks a lot. Regards, Ami Parikh (213)590-0005 On Thu, Feb 26, 2015 at 12:39 PM, Renxia Wang wrote: > Not sure how yo

Re: MetaData fornear duplicates

2015-02-26 Thread Renxia Wang
Not sure how you implement it so it is hard to tell. You may want to take a look at the SegmentReader's get and getMapRecords methods. Those may give you ideas. You can use SegmentReader.get directly to get the segment data too. While it is slow as it slepp(5000) at every time you call it, so slow

[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339119#comment-14339119 ] Lewis John McGibbney commented on NUTCH-1933: - Hi [~jorgelbg] thanks for notic

[jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6

2015-02-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339103#comment-14339103 ] Lewis John McGibbney commented on NUTCH-1946: - Hi [~hsaputra], can you try cle

[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-26 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339080#comment-14339080 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-1933: --- I see a {{t

[jira] [Commented] (NUTCH-1946) Upgrade to Gora 0.6

2015-02-26 Thread Henry Saputra (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339053#comment-14339053 ] Henry Saputra commented on NUTCH-1946: -- Tried to run ant in the 2.0 branch with your

Re: MetaData fornear duplicates

2015-02-26 Thread Ami Akshay Parikh
I am using the MapFileReader to iterate through the file. And I read the key into a Text object and the MetaData into a ParseData object. I get the following exception: Exception in thread "main" java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:197) at org.apache.hado

[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-26 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338951#comment-14338951 ] Hudson commented on NUTCH-1933: --- SUCCESS: Integrated in Nutch-trunk #2991 (See [https://bui

Unsubscribe

2015-02-26 Thread Massimo Miccoli
Massimo > Il giorno 26/feb/2015, alle ore 19:31, lewi...@apache.org ha scritto: > > Author: lewismc > Date: Thu Feb 26 18:31:39 2015 > New Revision: 1662530 > > URL: http://svn.apache.org/r1662530 > Log: > NUTCH-1933 nutch-selenium plugin > > Added: >nutch/trunk/src/plugin/lib-selenium/ >

Re: MetaData fornear duplicates

2015-02-26 Thread Renxia Wang
Hi Ami, What method of what class do you use to get the meta data? Please provide more info about this, log etc. Zhique On Thu, Feb 26, 2015 at 10:53 AM, Ami Akshay Parikh wrote: > Hello, > > When I try to use the parse_data from the segment directory for getting > the MetaData for finding nea

MetaData fornear duplicates

2015-02-26 Thread Ami Akshay Parikh
Hello, When I try to use the parse_data from the segment directory for getting the MetaData for finding near duplicates, My code runs into a EOFException. I found something about a bug in nutch in the archives, but I wanted to know if anyone else is facing this problem and how can I possibly resol

[jira] [Resolved] (NUTCH-1933) nutch-selenium plugin

2015-02-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney resolved NUTCH-1933. - Resolution: Fixed Committed @revision 1662530 in trunk > nutch-selenium plugin >

[jira] [Updated] (NUTCH-1933) nutch-selenium plugin

2015-02-26 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1933: Assignee: Mohammad Al-Mohsin (was: Lewis John McGibbney) > nutch-selenium plugin >

[jira] [Commented] (NUTCH-1950) File name too long when bin/nutch dump

2015-02-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338684#comment-14338684 ] Sebastian Nagel commented on NUTCH-1950: Great! For a MD5 calculation, see o.a.had

[jira] [Commented] (NUTCH-1950) File name too long when bin/nutch dump

2015-02-26 Thread Chong Li (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338211#comment-14338211 ] Chong Li commented on NUTCH-1950: - I have thought about that and at first we just wanted e

[jira] [Commented] (NUTCH-1950) File name too long when bin/nutch dump

2015-02-26 Thread Sebastian Nagel (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14338198#comment-14338198 ] Sebastian Nagel commented on NUTCH-1950: Is it really a good idea to take the syst