How to add unextracted field when using Sorl Cell

Martin Líška Mon, 20 Jun 2011 05:48:09 -0700

Hello,

I would like to transform my existing Lucene application to Solr but I'm
struggling with one thing (most important though).
I would like to index XHTML files using ExtractingRequestHandler - no
problem with that. But, I have a custom Tokenizer which expects well formed
xml (whole xhtml document preferably) and produces certain tokens with
payloads for Lucene. I've added this tokenizer to Solr as a plugin, added
required schema.xml entries (own field type which uses this Tokenizer and a
field that uses this type) and everything works fine in Solr admin analysis.
I am having a hard time going through the Solr Cell API and sources finding
out how to incorporate creation of such custom field. What I would like to
do, I guess, is to be able to recognize the input document type (this is
already done somewhere) and when it is XHTML file, I would like to add a
custom field to SolrDocument that uses certain schema.xml field definition
and to feed it with the whole InputStream of a input document.
I hope it is clear enough.
Can somebody point me in the right direction how to achieve this?


Thank you,

Martin

How to add unextracted field when using Sorl Cell

Reply via email to