No problem perms granted!

https://wiki.apache.org/nutch/ContributorsGroup

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++










On 8/1/16, 12:11 PM, "Sebastian Greenholtz" <smgreenho...@gmail.com> wrote:

>I'd be happy to! My username is SebastianGreenholtz
>
>On Mon, Aug 1, 2016, 1:04 PM Mattmann, Chris A (3980) <
>chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> Great work Sebastien thank you for this. Would you be willing to
>> update the wiki with this info? Please let me know your username
>> and I will grant you permissions.
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattm...@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Director, Information Retrieval and Data Science Group (IRDS)
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> WWW: http://irds.usc.edu/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 8/1/16, 11:01 AM, "Sebastian Greenholtz" <smgreenho...@gmail.com>
>> wrote:
>>
>> >I struggled with the same thing recently. Nurch 1.12 does work with Solr
>> >6.1.0, but you have to do two things differently.
>> >
>> >1. The schema file that comes with Solr is originally named managed_schema
>> >and it's stored in
>> >${SOLR_HOME}/server/solr/configsets/managed_schema
>> >
>> >This file should be renamed to schema.xml.
>> >
>> >2. To index with Solr, first start up Solr using the command line command
>> >
>> >${SOLR_HOME}/bin/start -e cloud -noprompt
>> >
>> >Solr should start up at localhost:8983/solr
>> >
>> >To run the indexing:
>> >
>> >${NUTCH_HOME}/bin/crawl -I -D solr.server.url=
>> >http://localhost:8983/solr/gettingstarted urls/ segments/ 2
>> >
>> >Some of these parameters can be changed. They are explained here:
>> >https://wiki.apache.org/nutch/bin/crawl
>> >
>> >The thing that isn't explained anywhere is that your solr.server.url value
>> >is the base url for Solr admin with the core name after the forward slash.
>> >For the example project, the core is called gettingstarted.
>> >
>> >Hope that helps!
>> >
>> >Sebastian
>> >
>> >On Mon, Aug 1, 2016, 11:39 AM Ondřej Sojka <ondrej.so...@gmail.com>
>> wrote:
>> >
>> >> The last three days, I've been struggling with making Nutch index one
>> web
>> >> into Solr. The tutorial on your wiki is extremely outdated and the
>> command
>> >> line tool doesn't work like expected. Now I think I may have managed to
>> >> crawl the web, but not index it into solr. I'm trying to run bin/nutch
>> >> solrindex crawl (my crawldb I previously entered into bin/crawl), but It
>> >> returns just the help of solrindex. By the help it outputs, it makes me
>> >> think the crawldb is the only mandatory parameter.
>> >>
>> >> I think there must be an other source of documentation other than the
>> wiki
>> >> for recent versions of Nutch, or is the wiki the only source of
>> >> documentation? With what versions of Solr is Nutch 1.12 compatible?
>> >>
>> >> Ondrej Sojka
>> >>
>>

Reply via email to