Hi,
I don't see why this is a problem, and you're mentioning the solution
yourself. If you want detection by content, then don't pass the filename.
Tilman
On 20.04.2023 08:19, didon...@126.com wrote:
Hi, Tilman
I have encountered another problem.
t1.xml is a simple plain text file, not a standard XML file.
When I use Tika Server 2.7.0 to extract file content, the results
are as follows:
curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept:
text/plain" -H "Content-Disposition: attachment; filename=t1.xml"
Result: fail (empty)
curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain"
curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept:
text/plain" -H "Content-Disposition: attachment; filename=t1.txt"
curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept:
text/plain" -H "Content-Disposition: attachment; filename=t1.docx"
Result: success
The file name information affects the extraction result.