Thanks Tim for the explanation,

I’ll register for an ASF Jira account and attach the CR3 file there, I’ll also 
try to source some CRW files if we have any.

All the best,

Richard

From: Tim Allison <talli...@apache.org>
Date: Wednesday, 22 March 2023 at 14:27
To: user@tika.apache.org <user@tika.apache.org>
Subject: Re: Tika incorrectly detecting Canon raw image file .cr3 as 
video/quicktime
Let's move discussion here: https://issues.apache.org/jira/browse/TIKA-3991 
@Richard, if you'd like access to our JIRA, see: 
https://selfserve.apache.org/jira-account.html On Wed, Mar 22, 2023 at 10:2
External (talli...@apache.org<mailto:talli...@apache.org>)
  Report This 
Email<https://protection.inkyphishfence.com/report?id=c3luY2hyb25vc3MvcmljaGFyZC50b29sYW5Ac3luY2hyb25vc3MuY29tLzQ0ZmNlMDE2MDMwM2Q3NDRiZDk0N2MwNmE2OTc0NTE5LzE2Nzk0OTUyMTcuNTE=#key=c4f9fd0bbdc163e8d901b5a3b38c81c0>
  FAQ<https://www.inky.com/banner-faq>  Protection by 
INKY<https://www.inky.com/protection-by-inky>


Let's move discussion here: 
https://issues.apache.org/jira/browse/TIKA-3991<https://shared.outlook.inky.com/link?domain=issues.apache.org&t=h.eJxNjk0OwiAYBa9iWCs_lkLoqi6NWy9AAQVFaPhojDHe3eDGbt_kZeaNlhLRsEG-1hkGQgLA4gDrWRvvcC5XcgtFk6nkJzhyPp4Ou04phrYbdG8_eCXjS04ZgJRgvC4W15yjTuMKYZMfhPOLcZQJ2tHOSs4nq7g0VGihJO-ZIkxIxVW_ZxL3P4NrhqpjDJDT-G9qzDa2mj5fXBA_rg.MEQCIGJYpHSoGyTUkXhdi2TKCjoP0Y_cXhS3omIpBoXNopljAiBampqN5OeeofH9cSMdrZDZsA5OX1x1huxBDyAYRjxqRQ>



@Richard, if you'd like access to our JIRA, see:

https://selfserve.apache.org/jira-account.html<https://shared.outlook.inky.com/link?domain=selfserve.apache.org&t=h.eJxNjsEOwiAQBX_FcFYolkLoqb-yLlSqlG2AmhjjvxtO9vomLzMftufIxhMLtW5lFKL4OBefX57DBhg8p3wXjyXDBRBpT5WHukZ2PrFnu5V3wpApUSkiLxggO16JIqTpgDjSKpSa0XdSd33XO6PUzVllsNOgrVGDtEJqY5UdrtLwQTaDb4YKMS6F0vTvacw1dpi-P-4FQMI.MEUCIGBY7y53NEhGGhlf0Pjpeb7uhM481vUBc0NWW517mtSTAiEAtQs5N7DvaWzOjkaTxFPK82VhdUTZJWkdTG1bvKfOkhA>



On Wed, Mar 22, 2023 at 10:22 AM Tim Allison <talli...@apache.org> wrote:

>

> Thank you, Richard, for raising this.  In looking at these file

> formats, it looks like crw is based on ciff, cr2 is based on tiff and

> cr3 is based on quicktime.

>

> For some file formats we do, application/x-this-app; version=1.0,

> application/x-thisapp; version=2.0.  For others, we create separate

> main mimes as you've done:

>

> image/x-raw-canon

> image/x-raw-canon2

> image/x-raw-canon3

>

> I think we want to keep cr2 as subtype of TIFF and cr3 as subtype of

> mpeg/quicktime so that those parsers automatically correctly pick them

> up.

>

> Another option would be to subtype cr2 to crw and cr3 to crw, but then

> add cr2 to a supported format in our TIFFParser and cr3 to our

> mpeg/quicktime parser.

>

> Perhaps Nick might chime in on how we want to handle this.

>

> I think we should improve our detection of these at the very least.  I

> found some examples for cr2, if we can get examples for crw and cr3,

> that'd be helpful.  The dropfiles link isn't working for me at the

> moment. :(

>

> Some useful links (I want to document these for me.  You probably

> already know them!)

> [0] 
> https://exiftool.org/canon_raw.html<https://shared.outlook.inky.com/link?domain=exiftool.org&t=h.eJxNjssOwiAURH-lYW14WArBVf_EXC9UGim3AYwa478bdt3OmcmZL3uWxC4Di63t9SJEeK9LI0qcyl0gZMrXAi8e25bYaWCP3q2fjLFQplpFWTFC8bxvIM8HxJE2ofWCQSojRzl6q_XNO21RGjDO6kk5oYx12k1nZfmkuiF0Q4OU1kp5hh0whn6mM9_ZIfr9Ad-QPMU.MEQCIA7dCcloeXg2LGrj0UgCsynul7fS0X_EatG1dJ2O02p1AiBZktjmA1bnTmx6joELlo0EitXDMqfuWuP8lpY8cXshkg>

> http://fileformats.archiveteam.org/wiki/CR2<https://shared.outlook.inky.com/link?domain=fileformats.archiveteam.org&t=h.eJxNjksKwyAURbcSHBc_iVHMKNAddAevaqrE5AW1LaV078VZpvdwOedLnjmRqSOh1mNibInJL5g3qIVCtiG-fPWwUcwP9o5rZNdbTy4dWdunfHYbMu5YCsvRBsiOVsQE-3xC1OLGpFys50LxgQ9OS3l3RmrLFSij5SgME0obacZeaDqKZvDNUCGlWHCf4QAbfMtozDV2mn5_3m0_Sg.MEYCIQDR9LRm_HZVTMEX3I_dIQ5c28XPt5UBYSynwgyT65JzxwIhAPJ3EA3-8rr5_Gk2v25ncIeIDWKLE2Kv9E-SXFNzKpBg>

> https://github.com/lclevy/canon_cr3<https://shared.outlook.inky.com/link?domain=github.com&t=h.eJxNjcsOgyAUBX_FsG54VITgyj8xeKFiSrkGsIlp-u8NXbk9czLzIUeOZOxIqHUvI2PrVsOxUMAXixD9-2RgE6YZck9uHXm2bzkThIwJS2F5g2CzoxUx2jRd0N8h5QM8F4r3vHdaysUZqYErq4yWgzBMKG2kGe5C00G0gm-FamPcCqbJ7haCp5jXxlxjl-n7A6mhPG4.MEUCICb0oJEQaAY9m6mbYoMxcqfID4i8nelz3m1GA9P-dlF-AiEA7mH0W9_MRINo2x_uQDLzYKmHUvQAn3Iy6N9A1gFufTY>

>

> On Wed, Mar 22, 2023 at 5:12 AM Richard Toolan

> <richard.too...@synchronoss.com> wrote:

> >

> > Hello,

> >

> >

> >

> > We’ve noticed that Tika is incorrectly detecting the file .cr3 as 
> > video/quicktime, other raw files are detected as image/tiff (including the 
> > .cr3’s predecessor the .cr2). I’ve uploaded a sample file here 
> > https://dropfiles.org/j8CS4Snr<https://shared.outlook.inky.com/link?domain=dropfiles.org&t=h.eJxNjksOwiAURbfSMDZ87CtIR01cQleAQKWKPAI4MMa9G2ad3nOTc77kXSKZBxJay3VmzBXM2x59pVju7HG5rrCmQk4DefZb_SQbCiaslZXdBlMcbYjRpOWAqMUXA9is50LykY9OAdycBmW5NFIrmIRmQioNejoLRSfRDb4bmolxr5gWk40Nvmd05jo7TL8_DxY6Jw.MEUCIDJaoaGiNPhmbBcb1SJoj_rOhPQkoCKN4aoiSiJy7v16AiEAgvjl5Svd7OwxuN9l_ai757GdOmcmIn1Unee0sOeOokw>
> >  (that was taken from this review 
> > https://www.photographyblog.com/reviews/canon_eos_r10_review#google_vignette<https://shared.outlook.inky.com/link?domain=www.photographyblog.com&t=h.eJxNjUsOwiAUAK9icGv4WFqCK2_SUPqERuQRwDaN8e4G3bidSWZe5JkDuRyIrzWVC2PbttHksaLLJvl9CuioxQfLsC6wFWZNxDgCljELPv7o0SG6AOO6uAi1AjkdyL1Fyx6tzxixFJYX602eaUUMJl7_1Lcv5c0CFwPveDcrKadZS2X5YAatZC80E4PSUvdnoWgv2gHaoZoQloLxapKxHihm19zc3B96fwAoMk0U.MEUCIQC50kJJqhB72j54ckdjjwBl8H3rcZ_Sem4Hoq3-oLfAqwIgFFnoYH3YRhg_hERjfjCfrPXhwB5Er635cBVWqAUi1X8>)

> >

> >

> >

> > When we add a custom-mimetypes.xml file with a mime-type entry like this:

> >

> > <mime-type type="image/x-raw-canon">

> >   <_comment>Canon raw image</_comment>

> >   <sub-class-of type="image/tiff"/>

> >   <glob pattern="*.crw"/>

> >   <glob pattern="*.cr2"/>

> >   <glob pattern="*.cr3"/>

> > </mime-type>

> >

> >

> >

> > The .cr3 file is still identified as video/quicktime but when we add the 
> > below configuration Tika matches it to something close to what we want:

> >

> > <mime-type type="image/x-raw-canon3">

> >   <_comment>Canon raw image</_comment>

> >   <sub-class-of type="video/quicktime"/>

> >   <glob pattern="*.cr3"/>

> > </mime-type>

> >

> >

> >

> > But this won’t give us our desired output as we’re hoping to group all 
> > Canon raw images under the same mime-type.

> >

> >

> >

> > Do you have any ideas how to get this working?

> >

> >

> >

> > We’re using tika-core 2.7.0 in a Java 8 project.

> >

> >

> >

> > Thank you,

> >

> >

> >

> > Richard

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

> >

Reply via email to