Re: New Nutch Plugin] Delegate fetching to Selenium/Firefox for those jobs where you neeeeed javascript parsing

2014-07-31 Thread Julien Nioche
Hi Mo, Great to hear about the plugin and the tutorial you are planning to write. Why don't you add a link to your plugin from https://wiki.apache.org/nutch/PluginCentral? IMHO plugins don't necessarily need to live in the Nutch codebase and can happily be maintained at an external location e.g.

Re: [New Nutch Plugin] Delegate fetching to Selenium/Firefox for those jobs where you neeeeed javascript parsing

2014-07-31 Thread Julien Nioche
Hi, Just to add to what Seb said below : * (from https://github.com/momer/nutch-selenium-grid-plugin#nutch-selenium https://github.com/momer/nutch-selenium-grid-plugin#nutch-selenium) C) Not have to wait another 2 years for Nutch to patch in either the Ajax crawler hashbang workaround and

Nutch @ApacheCon Europe 2014

2014-07-31 Thread Sebastian Nagel
Hi, we're glad to announce that there will be two events dedicated to Nutch at the upcoming ApacheCon Europe http://events.linuxfoundation.org/events/apachecon-europe in Budapest, November 17 - 21, 2014. 1. an introductory talk about Nutch http://sched.co/1nyYa7b as part of the Lucene/Solr

Re: Nutch @ApacheCon Europe 2014

2014-07-31 Thread Bin Wang
+1 Nutch Plugin Development, customized protocol plugin and parser plugin would be great to have. Will the whole session streaming or recorded? Really want to see your guys' presentation ASAP but I remember the youtube videos for Apachecon Denver were not available until one month later... :(

Re: Nutch @ApacheCon Europe 2014

2014-07-31 Thread Mattmann, Chris A (3980)
So awesome great to hear guys! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email:

Re: [New Nutch Plugin] Delegate fetching to Selenium/Firefox for those jobs where you neeeeed javascript parsing

2014-07-31 Thread Mohammed Omer
Hey Julien, I definitely should have thanked all the work that goes into Nutch before that (at least I said that Nutch was an awesome, world class, web crawler though!). I get that patches are in the hands of the community, but for someone like me or the person who submitted