Hi,

I don't see why this is a problem, and you're mentioning the solution yourself. If you want detection by content, then don't pass the filename.

Tilman

On 20.04.2023 08:19, didon...@126.com wrote:
Hi, Tilman

I have encountered another problem.
    t1.xml is a simple plain text file, not a standard XML file.
    When I use Tika Server 2.7.0 to extract file content, the results are as follows:

curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain" -H "Content-Disposition: attachment; filename=t1.xml"
Result: fail (empty)

curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain"
curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain" -H "Content-Disposition: attachment; filename=t1.txt" curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain" -H "Content-Disposition: attachment; filename=t1.docx"
Result: success

    The file name information affects the extraction result.


Reply via email to