Hello All,
Is there a way to save the plain htmls from the crawl? Or is this
already stored in segments dir?
Best Regards,
-C.B.
Hi C.B.,
Can you please expand on this description?
On Sun, Jul 10, 2011 at 11:52 AM, Cam Bazz camb...@gmail.com wrote:
Hello All,
Is there a way to save the plain htmls from the crawl? Or is this
already stored in segments dir?
Best Regards,
-C.B.
--
*Lewis*
I would like to access it and run my own / parser / analyzer if
necessary. can I read this segment data?
Best
On Sun, Jul 10, 2011 at 9:08 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
Well, the raw data is stored inside the segment. Without it there would be
nothing to parse. What do
Yes. You can build a plugin that implements a parser. Check the wiki [1] to
get started. If you intend to write a parser for an exotic mime-type consider
contributing to Apache Tika.
What exactly are you trying to accomplish? There may be an easier method.
[1]:
4 matches
Mail list logo