Hi,

I'm using Solr 1.4.1.
The scenario involves user uploading multiple files. These have content
extracted using SolrCell, then indexed by Solr along with other information
about the user.

ContentStreamUpdateRequest seemed like the right choice for this - use
addFile() to send file data, and use setParam() to add normal data fields.

However, when I do multiple addFile() to ContentStreamUpdateRequest, I
observed that at the server side, even the file parts of this multipart post
are interpreted as regular form fields by the FileUpload component.
I found that FileUpload does so because the "filename" value in
"Content-Disposition" headers of each part are not being set.
Digging a bit further, it seems the actual root cause is in the client side
solrj API ... the CommonsHttpSolrServer class is not setting "filename"
value in "Content-Disposition" header while creating multipart Part
instances (from HttpClient framework).

I solved this problem by a hack - in CommonsHttpSolrServer.request() method
where the PartBase instances are created, I overrode
"sendDispositionHeader()" and added "filename" value. That solved the
problem.

However, my questions are:
1. Am I using ContentStreamUpdateRequest wrong, or is this actually a bug?
Should I be using something else?

2. My end goal is to map contents of each file to *separate* fields, not a
common field. Since the regular ExtractingRequestHandler maps all content to
just one field, I believe I've to create a custom RequestHandler (possibly
reusing existing SolrCell classes).
Is this approach right?

Thanks
Karthik

Reply via email to