Hi Avi, Just to clarify, are you asking for some way to determine whether a given file (format) will never return any text (other than metadata)?
Thanks, -- Ken On Aug 7, 2014, at 11:28pm, Avi Hayun <avrah...@gmail.com> wrote: > Hi, > > I am crawling my site and am using Tika for binary content parsing. > > But, how can I know if a certain url contains binary content or plain text ? > > I can get the contentType. > > > So for now I am using: > if (typeStr.contains("image") || typeStr.contains("audio") || > typeStr.contains("video") || typeStr.contains("application")) { > return true; > } > > > Which is dumb code. > > I will replace the plain strings with Tika's MediaType objects but still I > need better code > > Does anyone have any better idea ? > > > > > Thank you for your help, > Avi