Document encoding

2020-01-10 Thread Two Way Communications via 4D_Tech
Hi All, An important customer of mine has requested that all documents, sent to him, are UTF-8 encoded. This concerns PDF files, text files, Word, Excel, picture files. I did some tests, but can’t figure out how to do that. If, e.g., I look at a pdf file in BBEdit, it says ‘Mac Roman’. Then I

AW: Document encoding

2020-01-13 Thread Epperlein, Lutz (agendo) via 4D_Tech
sult. Regards Lutz -Ursprüngliche Nachricht- Von: 4D_Tech [mailto:4d_tech-boun...@lists.4d.com] Im Auftrag von Two Way Communications via 4D_Tech Betreff: Document encoding Hi All, An important customer of mine has requested that all documents, sent to him, are UTF-8 encoded. This concerns PDF fil

Re: Document encoding

2020-01-13 Thread Koen Van Hooreweghe via 4D_Tech
Hi Rudy, IMHO UTF-8 encoding only makes sense in the context of plain text files (character based files like txt, csv, tsv, xml, json, html,...). But it has no meaning for binary files (PDF, pictures). xlsx and docx files are essentially zip archives containing a bunch of xml files. For xml fil

Re: Document encoding

2020-01-13 Thread Keisuke Miyako via 4D_Tech
alternative solutions for guessing plain text encoding https://opensource.google/projects/ced also https://github.com/miyako/4d-plugin-text-convert $err:=CP Get good encodings ($euc;$codepages) $err:=ICU Get good encodings ($euc;$encodings;$languages;$confidences) but I agree with Koen, to

Re: Document encoding

2020-01-15 Thread Two Way Communications via 4D_Tech
Thanks for the insights guys. I was already suspicious about the whole requirement, because I had a feeling this was not correct. After discussing again with the customer, it became clear that there was an error in communication. It’s the old dutch / French thing in Belgium, especially if one