Hi team, I am currently working on an application wherein I would like to whitelist the filetypes supported by TIKA And discard rest of the files to avoid unknown behaviour/memory leaks. I am currently referring to https://cwiki.apache.org/confluence/display/TIKA/File+Types+and+Dependencies. But, when I used json, log files, I see that the content is getting extracted even when it is not listed under the confluence. Is file extension list mentioned under this confluence for standard package complete or it is partial?
Also, I came across a function which list down supported MIME types for a particular parser. How would this approach behave if I submit untrusted/unsupported file type to TIKA for parser and supported MIME types detection? Would it try to load file contents in memory? Would there be a chance of memory leak when we try to just detect MIME type of a file using TIKA detect method? Thanks, Neha