No problem perms granted! https://wiki.apache.org/nutch/ContributorsGroup
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA WWW: http://irds.usc.edu/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ On 8/1/16, 12:11 PM, "Sebastian Greenholtz" <smgreenho...@gmail.com> wrote: >I'd be happy to! My username is SebastianGreenholtz > >On Mon, Aug 1, 2016, 1:04 PM Mattmann, Chris A (3980) < >chris.a.mattm...@jpl.nasa.gov> wrote: > >> Great work Sebastien thank you for this. Would you be willing to >> update the wiki with this info? Please let me know your username >> and I will grant you permissions. >> >> Cheers, >> Chris >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Chief Architect >> Instrument Software and Science Data Systems Section (398) >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 168-519, Mailstop: 168-527 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Director, Information Retrieval and Data Science Group (IRDS) >> Adjunct Associate Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> WWW: http://irds.usc.edu/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> >> >> >> >> >> >> >> >> >> >> On 8/1/16, 11:01 AM, "Sebastian Greenholtz" <smgreenho...@gmail.com> >> wrote: >> >> >I struggled with the same thing recently. Nurch 1.12 does work with Solr >> >6.1.0, but you have to do two things differently. >> > >> >1. The schema file that comes with Solr is originally named managed_schema >> >and it's stored in >> >${SOLR_HOME}/server/solr/configsets/managed_schema >> > >> >This file should be renamed to schema.xml. >> > >> >2. To index with Solr, first start up Solr using the command line command >> > >> >${SOLR_HOME}/bin/start -e cloud -noprompt >> > >> >Solr should start up at localhost:8983/solr >> > >> >To run the indexing: >> > >> >${NUTCH_HOME}/bin/crawl -I -D solr.server.url= >> >http://localhost:8983/solr/gettingstarted urls/ segments/ 2 >> > >> >Some of these parameters can be changed. They are explained here: >> >https://wiki.apache.org/nutch/bin/crawl >> > >> >The thing that isn't explained anywhere is that your solr.server.url value >> >is the base url for Solr admin with the core name after the forward slash. >> >For the example project, the core is called gettingstarted. >> > >> >Hope that helps! >> > >> >Sebastian >> > >> >On Mon, Aug 1, 2016, 11:39 AM Ondřej Sojka <ondrej.so...@gmail.com> >> wrote: >> > >> >> The last three days, I've been struggling with making Nutch index one >> web >> >> into Solr. The tutorial on your wiki is extremely outdated and the >> command >> >> line tool doesn't work like expected. Now I think I may have managed to >> >> crawl the web, but not index it into solr. I'm trying to run bin/nutch >> >> solrindex crawl (my crawldb I previously entered into bin/crawl), but It >> >> returns just the help of solrindex. By the help it outputs, it makes me >> >> think the crawldb is the only mandatory parameter. >> >> >> >> I think there must be an other source of documentation other than the >> wiki >> >> for recent versions of Nutch, or is the wiki the only source of >> >> documentation? With what versions of Solr is Nutch 1.12 compatible? >> >> >> >> Ondrej Sojka >> >> >>