Hi,
I want that Tika can detect a textfile with shift_jis as charEncoding.
I found this one here:
https://github.com/dadoonet/fscrawler/issues/400
(and there is also a ticket for the problem in the Jira of Tika: 
https://issues.apache.org/jira/browse/TIKA-2437)

So, I put the filename also in my request to give Tika a hint. When I make a 
PUT request there is all fine (I get back the "Content-Type": "text/plain; 
charset=Shift_JIS" and also the shift_jis text I want to have). But when I make 
a POST request I get the problem that I cannot add a Content-Disposition header 
in the Post-Body without also adding a Content-Type header (I use Java and the 
MultipartEntityBuilder for my request to Tika Server (2.6.0)). However, when I 
add a Content-Type header than Tika uses it for his detection also when it is 
set as Wildcard. So, all what I get in this situation is "Content-Type": 
"application/octet-stream" without any detected text and the information that 
Tika used the EmptyParser.

I don't want to add the "Content-Type": "text/plain" in the request (this would 
work) because I do not have only textfiles. And I do not want to make a guess 
myself on the filename for the Content-Type. In my expectation that should Tika 
able to do.

I want to use Tika with Post requests. Is there any way to use it in this way 
and to detect shift_jis encoded textfiles?
Maybe, is there a method that I can tell Tika only to use Mime-Magic and the 
filename, but not to use the Content-Type for guessing the Mime-Type?


Reply via email to