I would index all attachments separately, but with some sort of reference
back to the mail message. That way, I could use the update handler for the
text and metadata of the mail message, and the the update/extract handler
for the binary attachment(s) and a restricted set of metadata (file name,
content type, reference back to email message).

Note that, if you're implementing some sort of connector for indexing your
content, you cold handle the binary attachments on the connector side,
instead.


On Tue, Dec 10, 2013 at 8:41 AM, neerajp <neeraj_star2...@yahoo.com> wrote:

> Pls. find my response in-line:
> Assuming that your binary fields are mime attachments to email messages,
> they will probably already be encoded as base 64.  Why not just leave
> them that way in solr too?  You can't do much with them other than store
> them right?  Or do you have some kind of image processing going on?  You
> can always decode them in your client when you pull them out.
>
> [Neeraj]: Yes, binary fields are mime attachments to email messages. But I
> want to index attachment.
> For that I need to convert base64 encoded data in binary format at Solr
> side
> and then by using some technique, I need to extract text out of it so that
> the text can be indexed and I can search inside attachment.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-on-plain-text-and-binary-data-in-a-single-HTTP-POST-request-tp4105661p4105860.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to