#INTRO
After digging up for a while I've found where the issue comes from for both 
`.html` and `.py` (bug #1857824) files.

#SHORT
The culprit responsible for misidentification resides in `.xml` database which 
specifies how to match mime-type against input data. It can be found here [2].

#LONG
The `kmimetypefinder.cpp` pulls up [0] `QMimeDatabase db` apis by 
`db.mimeTypeForFile(...)` which in turns bootstrup `QMimeDatabasePrivate ...` 
XML database from .xml file.[1] 

If we look carefully at the content of the `"text/x-perl"` entry we
would see the following:

```
    <alias type="text/x-perl"/>
    <magic priority="50">
      ...
      <match value="use strict" type="string" offset="0:256"/> 
      ...
    </magic>
```

Did you notice the offset attribute `"0:256"`? Now if we run the
following two cases we will see that files whose content contains
keywords `use strict` in the range of 1..256 will be identified as
`text/x-perl` script and as `text/html` if the `use trict` is located
outside of such range otherwise, checkout:

💲 tee "index.html" <<eol ; echo -e "\n"; kmimetypefinder5 index.html
`printf "_"%.0s {1..256}`use strict
eol

application/x-perl # <- OUTPUT IS WRONG ⚠️

💲 tee "index.html" <<eol ; echo -e "\n"; kmimetypefinder5 index.html
`printf "_"%.0s {1..257}`use strict
eol

text/html # <- OUTPUT IS CORRECT!!! ✅ - Surprising, huh? 😏


#CONCLUSION
This proves that the bug comes from QTBase database which wrongly identifies 
`x-perl`'s keywords in JS scripts. The latter have `'use strict'` keyword that 
specifically should be placed at the top of the script. It seems like that they 
overlap for both languages. I think appropriate bug should be opened in the 
QTBase bug registry.


[0]: 
https://github.com/KDE/kde-cli-tools/blob/master/kmimetypefinder/kmimetypefinder.cpp
[1]: 
https://github.com/qt/qtbase/blob/03dfd4199deb4a0f5123fb1eead42f7e1f85e9e3/src/corelib/mimetypes/qmimedatabase.cpp#L102

[2]:
https://github.com/qt/qtbase/tree/03dfd4199deb4a0f5123fb1eead42f7e1f85e9e3/src/corelib/mimetypes/mime/packages

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1890716

Title:
  misidentifies .html file as Perl script when it contains JavaScript
  "use strict"

To manage notifications about this bug go to:
https://bugs.launchpad.net/shared-mime-info/+bug/1890716/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to