Re: Not all EML files are indexing during indexing

Charlie Hull Wed, 03 Jun 2020 02:34:45 -0700

I think the OP is indexing flat files, not web pages (but otherwise, Iagree with you that Scrapy is great - I know some of the people behindit too and they're a good bunch).


Charlie


On 02/06/2020 16:41, Walter Underwood wrote:

On Jun 2, 2020, at 7:40 AM, Charlie Hull <char...@flax.co.uk> wrote:

If it was me I'd probably build a standalone indexer script in Python that did 
the file handling, called out to a separate Tika service for extraction, posted 
to Solr.

I would do the same thing, and I would base that script on Scrapy (https://scrapy.org 
<https://scrapy.org/>). I worked on a Python-based web spider for about ten 
years.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com

Re: Not all EML files are indexing during indexing

Reply via email to