-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

If you ask for the general coding, we made one reference implementation for 
creating a lucene
index with Tika output, by crawling a datasource. If you want to give it a try, 
here you can find
the according snippet:

https://github.com/leechcrawler/leech/blob/master/codeSnippets.md
(under 'Create a Lucene index')

Maybe it is a good starting point.

We also don't use the 'Tika ParsingReader/Lucene Field generation with a 
reader' combination,
because of the lack of Lucene to set a Fields Store and Index configuration if 
you use a Reader
instead of a String (at least at the Field class implementation up to version 
3.6.2). But I'm the
same opinion as Mike, that this should not make so much difference.




On 25.02.2013 12:00, Michael McCandless wrote:
> Data storage allocation for what?  The parsed text?
> 
> Unless you have really large documents, it's simplest to just use Tika to 
> parse, then build a
> Lucene Document, then index it.
> 
> With really large documents it's possible to make a Lucene Field using 
> Reader, so Lucene
> incrementally reads the characters, and Tika's ParsingReader to create the 
> Reader, but I
> suspect this won't save that much memory in general (many parsers require 
> loading the full
> binary document in RAM, I believe).
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> On Sun, Feb 24, 2013 at 9:36 PM,  <sdr...@sina.com> wrote:
>> .. but how to streamline the two, lucene and tika, through some internal 
>> interface, that
>> avoids the whole piece of data storage allocation ?
>> 
>> ----- 原始邮件 ----- 发件人:Michael McCandless <luc...@mikemccandless.com> 
>> 收件人:user@tika.apache.org,
>> sdr...@sina.com 主题:Re: hello , how to utilize tika inside lucene ? 
>> 日期:2013年02月25日 03点55分
>> 
> 

- -- 
______________________________________________________________________________
Christian Reuschling, Dipl.-Ing.(BA)
Software Engineer

Knowledge Management Department
German Research Center for Artificial Intelligence DFKI GmbH
Trippstadter Straße 122, D-67663 Kaiserslautern, Germany

Phone: +49.631.20575-1250
mailto:reuschl...@dfki.de  http://www.dfki.uni-kl.de/~reuschling/

- ------------Legal Company Information Required by German Law------------------
Geschäftsführung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
                  Dr. Walter Olthoff
Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313=
______________________________________________________________________________
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlErZf8ACgkQ6EqMXq+WZg9/SQCdHgPDQLai9CVWXEGVotwq/B6Z
hKcAn363JU9Dyiqfc7Ei2k89RMDjwdmj
=9atL
-----END PGP SIGNATURE-----

Reply via email to