Hello, I would like to transform my existing Lucene application to Solr but I'm struggling with one thing (most important though). I would like to index XHTML files using ExtractingRequestHandler - no problem with that. But, I have a custom Tokenizer which expects well formed xml (whole xhtml document preferably) and produces certain tokens with payloads for Lucene. I've added this tokenizer to Solr as a plugin, added required schema.xml entries (own field type which uses this Tokenizer and a field that uses this type) and everything works fine in Solr admin analysis. I am having a hard time going through the Solr Cell API and sources finding out how to incorporate creation of such custom field. What I would like to do, I guess, is to be able to recognize the input document type (this is already done somewhere) and when it is XHTML file, I would like to add a custom field to SolrDocument that uses certain schema.xml field definition and to feed it with the whole InputStream of a input document. I hope it is clear enough. Can somebody point me in the right direction how to achieve this?
Thank you, Martin