Amrit Sarkar wrote: >> Strange, >> >> Can you add: "text/html;charset=utf-8". This is wiki.apache.org page's >> Content-Type. Let's see what it says now.
Same thing. Verified Content-Type: quadra[git:master]$ wget -S -O /dev/null http://quadra:9091/index.md |& grep Content-Type Content-Type: text/html;charset=utf-8 quadra[git:master]$ ] quadra[git:master]$ docker exec -it --user=solr solr bin/post -c handbook http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md /docker-java-home/jre/bin/java -classpath /opt/solr/dist/solr-core-7.0.1.jar -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web org.apache.solr.util.SimplePostTool http://quadra:9091/index.md SimplePostTool version 5.0.0 Posting web pages to Solr url http://localhost:8983/solr/handbook/update/extract Entering auto mode. Indexing pages with content-types corresponding to file endings md SimplePostTool: WARNING: Never crawl an external web site faster than every 10 seconds, your IP will probably be blocked Entering recursive mode, depth=10, delay=0s Entering crawl at level 0 (1 links total, 1 new) SimplePostTool: WARNING: Skipping URL with unsupported type text/html SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a HTTP result status of 415 0 web pages indexed. COMMITting Solr index changes to http://localhost:8983/solr/handbook/update/extract... Time spent: 0:00:00.531 quadra[git:master]$ Kevin >> >> Amrit Sarkar >> Search Engineer >> Lucidworks, Inc. >> 415-589-9269 >> www.lucidworks.com >> Twitter http://twitter.com/lucidworks >> LinkedIn: https://www.linkedin.com/in/sarkaramrit2 >> >> On Fri, Oct 13, 2017 at 6:44 PM, Kevin Layer <la...@franz.com> wrote: >> >> > OK, so I hacked markserv to add Content-Type text/html, but now I get >> > >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html >> > >> > What is it expecting? >> > >> > $ docker exec -it --user=solr solr bin/post -c handbook >> > http://quadra:9091/index.md -recursive 10 -delay 0 -filetypes md >> > /docker-java-home/jre/bin/java -classpath >> > /opt/solr/dist/solr-core-7.0.1.jar >> > -Dauto=yes -Drecursive=10 -Ddelay=0 -Dfiletypes=md -Dc=handbook -Ddata=web >> > org.apache.solr.util.SimplePostTool http://quadra:9091/index.md >> > SimplePostTool version 5.0.0 >> > Posting web pages to Solr url http://localhost:8983/solr/ >> > handbook/update/extract >> > Entering auto mode. Indexing pages with content-types corresponding to >> > file endings md >> > SimplePostTool: WARNING: Never crawl an external web site faster than >> > every 10 seconds, your IP will probably be blocked >> > Entering recursive mode, depth=10, delay=0s >> > Entering crawl at level 0 (1 links total, 1 new) >> > SimplePostTool: WARNING: Skipping URL with unsupported type text/html >> > SimplePostTool: WARNING: The URL http://quadra:9091/index.md returned a >> > HTTP result status of 415 >> > 0 web pages indexed. >> > COMMITting Solr index changes to http://localhost:8983/solr/ >> > handbook/update/extract... >> > Time spent: 0:00:03.882 >> > $ >> > >> > Thanks. >> > >> > Kevin >> >