Many binary formats begin with magic byte sequences composed of ASCII characters, e.g. zipfiles begin with PK pdfs begin with %PDF- chms help files begin with ITSF etc.
Does tika make any attempt to distinguish normal txt ASCII documents that happen do begin with 'PK' from zip files? -- Antoni Myłka [EMAIL PROTECTED]
