Hi, 2008/8/2 Antoni Mylka <[EMAIL PROTECTED]>: > Many binary formats begin with magic byte sequences composed of ASCII > characters, e.g. > zipfiles begin with PK > pdfs begin with %PDF- > chms help files begin with ITSF > etc. > > Does tika make any attempt to distinguish normal txt ASCII documents > that happen do begin with 'PK' from zip files?
Not at the moment, but it probably should... I created an improvement issue for that, TIKA-154. BR, Jukka Zitting