unsubscribe

2015-02-23 Thread Gioele Zanzico
Sent from my iPhone > On 23 Feb 2015, at 22:03, "Shreya Kamani Shah" wrote: > > > __ > This email has been scanned by the Symantec Email Security.cloud service. > For more information please visit http://www.symanteccloud.com >

Re: Nutch-Selenium Plugin Truncates Binary Data

2015-02-23 Thread Mattmann, Chris A (3980)
Thank you Mohammad! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@n

[jira] [Created] (NUTCH-1949) Dump out the Nuth data into the Common Crawl format

2015-02-23 Thread Giuseppe Totaro (JIRA)
Giuseppe Totaro created NUTCH-1949: -- Summary: Dump out the Nuth data into the Common Crawl format Key: NUTCH-1949 URL: https://issues.apache.org/jira/browse/NUTCH-1949 Project: Nutch Issue T

How to verify Nutch - Selenium

2015-02-23 Thread nishant jani
Hello all, I was able to build nutch - selenium successfully, through the nutch patch and the instructions provided on github. I attempted to crawl a few URLs and was anticipating nutch to invoke firefox in order to fetch data involved in AJAX communication. However this did not happen. I was cur

[Nutch Wiki] Trivial Update of "Getting_Started" by LewisJohnMcgibbney

2015-02-23 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification. The "Getting_Started" page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/Getting_Started?action=diff&rev1=5&rev2=6 + <> + To new developers: If you want to beg

Re: [MASSMAIL]Re: How to read metadata/content of an URL in URLFilter?

2015-02-23 Thread Renxia Wang
Thanks Jorge for your useful information. So since there are multiple URLFilter instances being created during crawling, is there any way to share data among them? Like a hashmap, which may be useful to my purpose, duplicate detection. Or use a external in-memory database? I am also failed to get

[no subject]

2015-02-23 Thread Shreya Kamani Shah

[jira] [Commented] (NUTCH-1933) nutch-selenium plugin

2015-02-23 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333809#comment-14333809 ] Lewis John McGibbney commented on NUTCH-1933: - Hi [~almohsin] thanks for the p

Re: [MASSMAIL]Re: How to read metadata/content of an URL in URLFilter?

2015-02-23 Thread Jorge Luis Betancourt González
My two cents on the topic: The URLFilter family plugin are handled by the URLFilfters class, this class gets instantiated in several places in the source code, including the Fetcher and the Injector. The URLFilters class uses PluginRepository.get() method to load the plugins, this method indee

[jira] [Created] (NUTCH-1948) Make the Selenium remote web driver specification, configuration and selection available via a Factory-type mechanism

2015-02-23 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1948: --- Summary: Make the Selenium remote web driver specification, configuration and selection available via a Factory-type mechanism Key: NUTCH-1948 URL: https://issues.ap

Re: Nutch-Selenium Plugin Truncates Binary Data

2015-02-23 Thread Mohammad Al-Mohsin
Sure, I've just uploaded the updated patch. On Sun, Feb 22, 2015 at 4:50 PM, Mattmann, Chris A (3980) < chris.a.mattm...@jpl.nasa.gov> wrote: > I think this is fantastic Mohammad! > > Can you update the patch on NUTCH-1933 with this improvement, > so we can get it into the sources? > > Cheers, >

[jira] [Commented] (NUTCH-1928) Indexing filter of documents by the MIME type

2015-02-23 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1423#comment-1423 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-1928: --- Done! thank

[jira] [Closed] (NUTCH-1928) Indexing filter of documents by the MIME type

2015-02-23 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jorge Luis Betancourt Gonzalez closed NUTCH-1928. - Resolution: Fixed > Indexing filter of documents by the MIME type

[jira] [Commented] (NUTCH-1928) Indexing filter of documents by the MIME type

2015-02-23 Thread Jorge Luis Betancourt Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1421#comment-1421 ] Jorge Luis Betancourt Gonzalez commented on NUTCH-1928: --- Committed r

[jira] [Comment Edited] (NUTCH-1933) nutch-selenium plugin

2015-02-23 Thread Mohammad Al-Mohsin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333206#comment-14333206 ] Mohammad Al-Mohsin edited comment on NUTCH-1933 at 2/23/15 11:11 AM: ---

[jira] [Updated] (NUTCH-1933) nutch-selenium plugin

2015-02-23 Thread Mohammad Al-Mohsin (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Al-Mohsin updated NUTCH-1933: -- Attachment: NUTCH-selenium-trunk.v2.patch Takes care of Tika 1.7 update and handles only