There used to be a long analysis in the draft of this e-mail [*], but
let me cut to the chase.

Even something as simple as replacing the four-byte comment [**] at the
beginning of the file ("%\xd0\xd4\xc5\xd8" -> "%    ") that keeps the
file fully readable (!) results in the same behaviour but zero
detections:

$ sha256sum d_jss_paper*.pdf
0ae3b229fdd763a0571463dc98e02010752bb0213a672db6826afcd72ccaf291  
d_jss_paper1.pdf
9486d99c1c1f2d1b06f0b6c5d27c54d4f6e39d69a91d7fad845f323b0ab88de9  
d_jss_paper.pdf
$ diff -u <(hd d_jss_paper.pdf) <(hd d_jss_paper1.pdf)
--- /dev/fd/63  2024-01-28 13:00:43.454419322 +0300
+++ /dev/fd/62  2024-01-28 13:00:43.454419322 +0300
@@ -1,4 +1,4 @@
-00000000  25 50 44 46 2d 31 2e 35  0a 25 d0 d4 c5 d8 0a 37  |%PDF-1.5.%.....7|
+00000000  25 50 44 46 2d 31 2e 35  0a 25 20 20 20 20 0a 37  |%PDF-1.5.%    .7|
 00000010  37 20 30 20 6f 62 6a 0a  3c 3c 0a 2f 4c 65 6e 67  |7 0 obj.<<./Leng|
 00000020  74 68 20 32 36 32 38 20  20 20 20 20 20 0a 2f 46  |th 2628      ./F|
 00000030  69 6c 74 65 72 20 2f 46  6c 61 74 65 44 65 63 6f  |ilter /FlateDeco|

https://www.virustotal.com/gui/file/0ae3b229fdd763a0571463dc98e02010752bb0213a672db6826afcd72ccaf291

The scary-looking files and hosts being accessed are just Adobe Reader
and Chrome behaving in a manner indistinguishable from spyware. Upload
any PDF file with links in it and you'll see the same picture. Even the
original report for d_jss_paper.pdf from poweRlaw_0.70.6 says "no
sandboxes flagged this file as malicious".

I think that the few non-major antivirus products that "detected" the
original file remembered a low-quality checksum of a different file,
and this whole thread resulted from a checksum collision. 0x043BC33F
(71025471) is what, four bytes? Doesn't seem to be a standard CRC-32 or
the sum of all bytes modulo 2^32, though.

I cannot prove a negative, but I invite infosec people with more PDF
experience to comment further on the issue.

-- 
Best regards,
Ivan

[*] Colin seems to have used the Debian build of TeX Live 2017 to
generate it, which is non-trivial but possible to reproduce by
installing it from Debian Snapshots on top of Stretch. The resulting
file has a different hash (for valid reasons), the same behaviour, but
zero detections:
https://www.virustotal.com/gui/file/f7b0e0400167e06970ac61fcadfda29daec1c2ee685d4c9ff805e375bcffc985/behavior

Trying a "binary search" by removing PDF objects or replacing byte
ranges with ASCII spaces was also a dead end: any change results in no
detections.

[**] PDF 1.5 specification, section 3.1.2:

>> Comments (other than the %PDF−1.4 and %%EOF comments described in
>> Section 3.4, “File Structure”) have no semantics.

https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.5_v6.pdf#G8.1860480

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to