Sent from my iPhone
> On 23 Feb 2015, at 22:03, "Shreya Kamani Shah" wrote:
>
>
> __
> This email has been scanned by the Symantec Email Security.cloud service.
> For more information please visit http://www.symanteccloud.com
>
Thank you Mohammad!
++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@n
Giuseppe Totaro created NUTCH-1949:
--
Summary: Dump out the Nuth data into the Common Crawl format
Key: NUTCH-1949
URL: https://issues.apache.org/jira/browse/NUTCH-1949
Project: Nutch
Issue T
Hello all,
I was able to build nutch - selenium successfully, through the nutch patch
and the instructions provided on github. I attempted to crawl a few URLs
and was anticipating nutch to invoke firefox in order to fetch data
involved in AJAX communication. However this did not happen.
I was cur
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change
notification.
The "Getting_Started" page has been changed by LewisJohnMcgibbney:
https://wiki.apache.org/nutch/Getting_Started?action=diff&rev1=5&rev2=6
+ <>
+
To new developers: If you want to beg
Thanks Jorge for your useful information.
So since there are multiple URLFilter instances being created during
crawling, is there any way to share data among them? Like a hashmap, which
may be useful to my purpose, duplicate detection. Or use a external
in-memory database?
I am also failed to get
[
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333809#comment-14333809
]
Lewis John McGibbney commented on NUTCH-1933:
-
Hi [~almohsin] thanks for the p
My two cents on the topic:
The URLFilter family plugin are handled by the URLFilfters class, this class
gets instantiated in several places in the source code, including the Fetcher
and the Injector. The URLFilters class uses PluginRepository.get() method to
load the plugins, this method indee
Lewis John McGibbney created NUTCH-1948:
---
Summary: Make the Selenium remote web driver specification,
configuration and selection available via a Factory-type mechanism
Key: NUTCH-1948
URL: https://issues.ap
Sure, I've just uploaded the updated patch.
On Sun, Feb 22, 2015 at 4:50 PM, Mattmann, Chris A (3980) <
chris.a.mattm...@jpl.nasa.gov> wrote:
> I think this is fantastic Mohammad!
>
> Can you update the patch on NUTCH-1933 with this improvement,
> so we can get it into the sources?
>
> Cheers,
>
[
https://issues.apache.org/jira/browse/NUTCH-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1423#comment-1423
]
Jorge Luis Betancourt Gonzalez commented on NUTCH-1928:
---
Done! thank
[
https://issues.apache.org/jira/browse/NUTCH-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jorge Luis Betancourt Gonzalez closed NUTCH-1928.
-
Resolution: Fixed
> Indexing filter of documents by the MIME type
[
https://issues.apache.org/jira/browse/NUTCH-1928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1421#comment-1421
]
Jorge Luis Betancourt Gonzalez commented on NUTCH-1928:
---
Committed r
[
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333206#comment-14333206
]
Mohammad Al-Mohsin edited comment on NUTCH-1933 at 2/23/15 11:11 AM:
---
[
https://issues.apache.org/jira/browse/NUTCH-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mohammad Al-Mohsin updated NUTCH-1933:
--
Attachment: NUTCH-selenium-trunk.v2.patch
Takes care of Tika 1.7 update and handles only
16 matches
Mail list logo