Do you want to index the text in the attachments?

If so, you probably are better off creating a unique document for the
mail body and each attachment. A field in the document could give the
id of the main email document. The main email document could contain a
multivalued field giving all of the attachment ids.

On Thu, Mar 25, 2010 at 10:14 AM, Chris Hostetter
<hossman_luc...@fucit.org> wrote:
>
> : > I tried calling the addFile() twice (one call for each file) and no
> : > error but nothing getting indexed as well.
>        ...
> : Write your own RequestHandler that uses the existing 
> ExtractingRequestHandler
> : to actually parse the streams, and then you combine the results arbitrarily 
> in
> : your handler, eventually sending an AddUpdateCommand to the update 
> processor.
> : You can obtain both the update processor and SolrCell instance from
> : req.getCore().
>
> The key bit being: yes you contain attach multiple files to your request,
> and yes the SolrQueryRequest abstraction can handle that (it appears as
> two "ContentStreams" to the RequestHandler) but the existing
> ExtractingRequestHandler assumes there will only be one ContentStream and
> constructsa one document for it -- the API isn't really designed arround
> the idea of how to generate a single SolrInputDOcument from multipole
> COntentStreams (where would you get the "title" from? etc...)
>
> There was talk about trying to generalize this, but i don't think anyone
> else has looked into it much.  Here's one refrence, but i definitely
> remember a more recent thread about this idea...
>
> http://n3.nabble.com/ExtractingRequestHandler-and-XmlUpdateHandler-tt492202.html#a492211
>
>
>
> -Hoss
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to