Sorry, I did not really parse the question before replying (twice!)


But where does the content-type for the local file specified by the URL go?

/solr/update/csv?stream.url=file://myfile.csv

Do we need a stream.content-type or charset, or am I missing something?


I just investigated a bit.. and was disappointed with the results.  I
hoped that the URLConnection would fill in the content type for files,
but it doesn't (at least not reliably):

http://localhost:8983/solr/debug/dump?stream.url=file:///C:/mmm.xls
 <str name="contentType">content/unknown</str>

http://localhost:8983/solr/debug/dump?stream.url=file:///C:/xxx.jpg
 <str name="contentType">image/jpeg</str>

It sometimes gets the contentType correct, but it never adds charset info.

So, we have to do something...  we could:
a. explicitly set the stream.url.contentType with another param
b. return a FileReader directly (it takes care of charset for you)

(a) may be useful to override a remote content type that is incorrect,
but is kind of a pain for someone specifying a local file (and
probably does not know what content Type it is)

(b) requires adding getReader() to ContentStream - this would be
useful for direct posted content (the most common case) and for the
local file case.  Other cases would construct the reader from the
content type and stream.

I vote for (b) and perhaps also (a) - but If we have (b), (a) is only
really useful for urls where the content type is incorrect....

Reply via email to