Thank you, Richard, for raising this. In looking at these file formats, it looks like crw is based on ciff, cr2 is based on tiff and cr3 is based on quicktime.
For some file formats we do, application/x-this-app; version=1.0, application/x-thisapp; version=2.0. For others, we create separate main mimes as you've done: image/x-raw-canon image/x-raw-canon2 image/x-raw-canon3 I think we want to keep cr2 as subtype of TIFF and cr3 as subtype of mpeg/quicktime so that those parsers automatically correctly pick them up. Another option would be to subtype cr2 to crw and cr3 to crw, but then add cr2 to a supported format in our TIFFParser and cr3 to our mpeg/quicktime parser. Perhaps Nick might chime in on how we want to handle this. I think we should improve our detection of these at the very least. I found some examples for cr2, if we can get examples for crw and cr3, that'd be helpful. The dropfiles link isn't working for me at the moment. :( Some useful links (I want to document these for me. You probably already know them!) [0] https://exiftool.org/canon_raw.html http://fileformats.archiveteam.org/wiki/CR2 https://github.com/lclevy/canon_cr3 On Wed, Mar 22, 2023 at 5:12 AM Richard Toolan <richard.too...@synchronoss.com> wrote: > > Hello, > > > > We’ve noticed that Tika is incorrectly detecting the file .cr3 as > video/quicktime, other raw files are detected as image/tiff (including the > .cr3’s predecessor the .cr2). I’ve uploaded a sample file here > https://dropfiles.org/j8CS4Snr (that was taken from this review > https://www.photographyblog.com/reviews/canon_eos_r10_review#google_vignette) > > > > When we add a custom-mimetypes.xml file with a mime-type entry like this: > > <mime-type type="image/x-raw-canon"> > <_comment>Canon raw image</_comment> > <sub-class-of type="image/tiff"/> > <glob pattern="*.crw"/> > <glob pattern="*.cr2"/> > <glob pattern="*.cr3"/> > </mime-type> > > > > The .cr3 file is still identified as video/quicktime but when we add the > below configuration Tika matches it to something close to what we want: > > <mime-type type="image/x-raw-canon3"> > <_comment>Canon raw image</_comment> > <sub-class-of type="video/quicktime"/> > <glob pattern="*.cr3"/> > </mime-type> > > > > But this won’t give us our desired output as we’re hoping to group all Canon > raw images under the same mime-type. > > > > Do you have any ideas how to get this working? > > > > We’re using tika-core 2.7.0 in a Java 8 project. > > > > Thank you, > > > > Richard > > > > > > > > > > > >