Many binary formats begin with magic byte sequences composed of ASCII
characters, e.g.
zipfiles begin with PK
pdfs begin with %PDF-
chms help files begin with ITSF
etc.

Does tika make any attempt to distinguish normal txt ASCII documents
that happen do begin with 'PK' from zip files?

-- 
Antoni Myłka
[EMAIL PROTECTED]

Reply via email to