Re: Solr web crawler with recursive option

2019-04-12 Thread Andrew MacKay
You should look at Nutch apache solution that has Solr client support, it has all the index options you need and has schema to build Solr collection with all required fields for indexing. We have used it and works well, supports sitemap.xml to simplify indexing. On Fri, Apr 12, 2019 at 6:43 AM Ja

Re: Solr web crawler with recursive option

2019-04-12 Thread Jan Høydahl
I think there may actually be a bug. I was not able to crawl some other web site either. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 11. apr. 2019 kl. 18:55 skrev Erick Erickson : > > You are sending malformed XML to Solr. This can be something as silly as > ha

Re: Solr web crawler with recursive option

2019-04-11 Thread Erick Erickson
You are sending malformed XML to Solr. This can be something as silly as having extra spaces at the beginning. I’d capture the page being sent to Solr and put it in a formatter to check it…. Best, Erick > On Apr 11, 2019, at 3:49 AM, Shivprasad Shetty > wrote: > > Hello Team, > > >

Re: Solr web crawler with recursive option

2019-04-11 Thread Alexandre Rafalovitch
One of the files that post tool identified as XML is not. Possibly a 404 error or some such. So it is trying to parse the file and sees non-xml content right at start. Or if you are sure it is an XML file, maybe there is a BOM mark. Either way try to isolate the specific file. On a bigger picture

Solr web crawler with recursive option

2019-04-11 Thread Shivprasad Shetty
Hello Team, I am working on solr for the first time and got the setup done. Now I have created a core using command line and want to perform webcrawl of a third party site. If I try it with individual links, I am able to do the crawl and index it to the core.This was done using

Solr web crawler with recursive option

2019-04-11 Thread Shivprasad Shetty
I am working on solr for the first time and got the setup done. Now I have created a core using command line and want to perform webcrawl of a third party site. If I try it with individual links, I am able to do the crawl and index it to the core.This was done using > java -Dda

Solr web crawler with recursive option

2019-04-11 Thread Shivprasad Shetty
Hello Team, I am working on solr for the first time and got the setup done. Now I have created a core using command line and want to perform webcrawl of a third party site. If I try it with individual links, I am able to do the crawl and index it to the core.This was done using

Solr web crawler with recursive option

2019-04-11 Thread Shivprasad Shetty
Hello Team, I am working on solr for the first time and got the setup done. Now I have created a core using command line and want to perform webcrawl of a third party site. If I try it with individual links, I am able to do the crawl and index it to the core.This was done using