artodeto created NUTCH-2507:
-------------------------------

             Summary: NutchTutorial wiki pages as a lot of outdated command 
line calls when it starts with the solr interaction
                 Key: NUTCH-2507
                 URL: https://issues.apache.org/jira/browse/NUTCH-2507
             Project: Nutch
          Issue Type: Bug
          Components: documentation
    Affects Versions: 1.14
            Reporter: artodeto


h2. h2. Section "Step-by-Step: Indexing into Apache Solr"

replace:
{code:java}
Example: bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb 
crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize 
-deleteGone{code}
with:
{code:java}
Example: bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/nutch 
${NUTCH_RUNTIME_HOME}/crawl
/crawldb/ -linkdb ${NUTCH_RUNTIME_HOME}/crawl
/linkdb/ ${NUTCH_RUNTIME_HOME}/crawl
/segments/20131108063838
/ -filter -normalize -deleteGo{code}
 
h2. Section "Step-by-Step: Deleting Duplicates"

replace:
{code:java}
     Usage: bin/nutch dedup <solr url>
     Example: /bin/nutch dedup http://localhost:8983/solr
{code}
with:
{code:java}
     Usage: bin/nutch dedup <path to the crawldb> <solr url>
     Example: /bin/nutch dedup ${NUTCH_RUNTIME_HOME}/crawl/crawldb/ 
http://localhost:8983/sol
{code}

h2. Section "Step-by-Step: Cleaning Solr"

replace:
{code:java}
     Usage: bin/nutch clean -Dsolr.server.url=<solr index url> <crawldb>
     Example: /bin/nutch clean 
-Dsolr.server.url=http://localhost:8983/solr/nutch crawl/crawldb/
{code}
with:
{code}
     Usage: bin/nutch clean -Dsolr.server.url=<solr index url> <crawldb>
     Example: /bin/nutch clean 
-Dsolr.server.url=http://localhost:8983/solr/nutch 
${NUTCH_RUNTIME_HOME}/crawl/crawldb/
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to