Hi,

2008/8/2 Antoni Mylka <[EMAIL PROTECTED]>:
> Many binary formats begin with magic byte sequences composed of ASCII
> characters, e.g.
> zipfiles begin with PK
> pdfs begin with %PDF-
> chms help files begin with ITSF
> etc.
>
> Does tika make any attempt to distinguish normal txt ASCII documents
> that happen do begin with 'PK' from zip files?

Not at the moment, but it probably should... I created an improvement
issue for that, TIKA-154.

BR,

Jukka Zitting

Reply via email to