Hi Avi,

Just to clarify, are you asking for some way to determine whether a given file 
(format) will never return any text (other than metadata)?

Thanks,

-- Ken

On Aug 7, 2014, at 11:28pm, Avi Hayun <avrah...@gmail.com> wrote:

> Hi,
> 
> I am crawling my site and am using Tika for binary content parsing.
> 
> But, how can I know if a certain url contains binary content or plain text ?
> 
> I can get the contentType.
> 
> 
> So for now I am using:
> if (typeStr.contains("image") || typeStr.contains("audio") || 
> typeStr.contains("video") || typeStr.contains("application")) {
>                               return true;
>                       }
> 
> 
> Which is dumb code.
> 
> I will replace the plain strings with Tika's MediaType objects but still I 
> need better code
> 
> Does anyone have any better idea ?
> 
> 
> 
> 
> Thank you for your help,
> Avi


Reply via email to