Thank you, Richard, for raising this.  In looking at these file
formats, it looks like crw is based on ciff, cr2 is based on tiff and
cr3 is based on quicktime.

For some file formats we do, application/x-this-app; version=1.0,
application/x-thisapp; version=2.0.  For others, we create separate
main mimes as you've done:

image/x-raw-canon
image/x-raw-canon2
image/x-raw-canon3

I think we want to keep cr2 as subtype of TIFF and cr3 as subtype of
mpeg/quicktime so that those parsers automatically correctly pick them
up.

Another option would be to subtype cr2 to crw and cr3 to crw, but then
add cr2 to a supported format in our TIFFParser and cr3 to our
mpeg/quicktime parser.

Perhaps Nick might chime in on how we want to handle this.

I think we should improve our detection of these at the very least.  I
found some examples for cr2, if we can get examples for crw and cr3,
that'd be helpful.  The dropfiles link isn't working for me at the
moment. :(

Some useful links (I want to document these for me.  You probably
already know them!)
[0] https://exiftool.org/canon_raw.html
http://fileformats.archiveteam.org/wiki/CR2
https://github.com/lclevy/canon_cr3

On Wed, Mar 22, 2023 at 5:12 AM Richard Toolan
<richard.too...@synchronoss.com> wrote:
>
> Hello,
>
>
>
> We’ve noticed that Tika is incorrectly detecting the file .cr3 as 
> video/quicktime, other raw files are detected as image/tiff (including the 
> .cr3’s predecessor the .cr2). I’ve uploaded a sample file here 
> https://dropfiles.org/j8CS4Snr (that was taken from this review 
> https://www.photographyblog.com/reviews/canon_eos_r10_review#google_vignette)
>
>
>
> When we add a custom-mimetypes.xml file with a mime-type entry like this:
>
> <mime-type type="image/x-raw-canon">
>   <_comment>Canon raw image</_comment>
>   <sub-class-of type="image/tiff"/>
>   <glob pattern="*.crw"/>
>   <glob pattern="*.cr2"/>
>   <glob pattern="*.cr3"/>
> </mime-type>
>
>
>
> The .cr3 file is still identified as video/quicktime but when we add the 
> below configuration Tika matches it to something close to what we want:
>
> <mime-type type="image/x-raw-canon3">
>   <_comment>Canon raw image</_comment>
>   <sub-class-of type="video/quicktime"/>
>   <glob pattern="*.cr3"/>
> </mime-type>
>
>
>
> But this won’t give us our desired output as we’re hoping to group all Canon 
> raw images under the same mime-type.
>
>
>
> Do you have any ideas how to get this working?
>
>
>
> We’re using tika-core 2.7.0 in a Java 8 project.
>
>
>
> Thank you,
>
>
>
> Richard
>
>
>
>
>
>
>
>
>
>
>
>

Reply via email to