-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 If you ask for the general coding, we made one reference implementation for creating a lucene index with Tika output, by crawling a datasource. If you want to give it a try, here you can find the according snippet:
https://github.com/leechcrawler/leech/blob/master/codeSnippets.md (under 'Create a Lucene index') Maybe it is a good starting point. We also don't use the 'Tika ParsingReader/Lucene Field generation with a reader' combination, because of the lack of Lucene to set a Fields Store and Index configuration if you use a Reader instead of a String (at least at the Field class implementation up to version 3.6.2). But I'm the same opinion as Mike, that this should not make so much difference. On 25.02.2013 12:00, Michael McCandless wrote: > Data storage allocation for what? The parsed text? > > Unless you have really large documents, it's simplest to just use Tika to > parse, then build a > Lucene Document, then index it. > > With really large documents it's possible to make a Lucene Field using > Reader, so Lucene > incrementally reads the characters, and Tika's ParsingReader to create the > Reader, but I > suspect this won't save that much memory in general (many parsers require > loading the full > binary document in RAM, I believe). > > Mike McCandless > > http://blog.mikemccandless.com > > On Sun, Feb 24, 2013 at 9:36 PM, <sdr...@sina.com> wrote: >> .. but how to streamline the two, lucene and tika, through some internal >> interface, that >> avoids the whole piece of data storage allocation ? >> >> ----- 原始邮件 ----- 发件人:Michael McCandless <luc...@mikemccandless.com> >> 收件人:user@tika.apache.org, >> sdr...@sina.com 主题:Re: hello , how to utilize tika inside lucene ? >> 日期:2013年02月25日 03点55分 >> > - -- ______________________________________________________________________________ Christian Reuschling, Dipl.-Ing.(BA) Software Engineer Knowledge Management Department German Research Center for Artificial Intelligence DFKI GmbH Trippstadter Straße 122, D-67663 Kaiserslautern, Germany Phone: +49.631.20575-1250 mailto:reuschl...@dfki.de http://www.dfki.uni-kl.de/~reuschling/ - ------------Legal Company Information Required by German Law------------------ Geschäftsführung: Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff Vorsitzender des Aufsichtsrats: Prof. Dr. h.c. Hans A. Aukes Amtsgericht Kaiserslautern, HRB 2313= ______________________________________________________________________________ -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlErZf8ACgkQ6EqMXq+WZg9/SQCdHgPDQLai9CVWXEGVotwq/B6Z hKcAn363JU9Dyiqfc7Ei2k89RMDjwdmj =9atL -----END PGP SIGNATURE-----