PaulI wrote a simple function to guess the encoding of a file but in Rebol not 
LiveCode. I'm not sure how it compares with your current function in terms of 
accuracy. It is being used by a company which does a lot of text processing. 
(Though I don't know if that is a good reccomendation or not). The method I 
used is explained in the brief documentation - 
http://www.rebol.org/documentation.r?script=str-enc-utils.r]. The rules could 
be used to create a LiveCode function.PeterPS Sorry for top posting, I'm 
replying from a mobile app.
-------- Original message --------From: Paul Dupuis via use-livecode 
<use-livecode@lists.runrev.com> Date: 20/03/2020  23:35  (GMT+08:00) To: 
use-livecode@lists.runrev.com Cc: Paul Dupuis <p...@researchware.com> Subject: 
Re: Guessing the encoding of a test file... To Sean and Bob,Thank you for your 
replies. I may not have been clear enough in my original post:We make and sell 
an App for macOS and Windows. It's uses around the world by researchers (not a 
lot of them as it is a niche product) on their computers. The research 
applications allows input of data from text files. The sources of those text 
files are from various source those researcher have. It would negatively impact 
our competitiveness in our market if we forced the users to convert their data 
all to some specific text encoding, so we need to try to "guess" the encoding 
of those text files.There are many published algorithms for doing this and we 
have a past contractor of ours take a "best practice" algorithm and create a 
LCS "guessEncoding function. This replaced a previous guessEncoding function we 
had that from Richard Gaskin, which while quite good, did not cover as many 
test cases and the newer more robust one.My main question to the list was: Has 
anyone out there ALSO written a guessEncoding function they might like to share 
or license?Why did I ask this? Because I am interested in comparing the 
accuracy of our current handler to any other that may be available as, users 
being users, we recently have a user reveal a bug (mis named variable) in our 
current function that meant it was missing certain edge cases ( and this user 
has hundreds of text files that need this edge case to be properly recognized 
as MAcRoman encoding. So that bug has been fixed, but I am still interested in 
comparing any other giessEncoding routines to our current one to see if we can 
do better that we current are.To Mark,As always, thank for reading and 
responding Mark. We're actually doing what you suggest. We had a set of QA test 
cases (text files in many different line endings and encodings), some intended 
to fail (such as Windows Code Page's we don't support). We're expanding these 
and doing a review on macOS and Windows with our app. Ones that fail, that we 
think shouldn't fail, we will step through the code to see why they fail and if 
our algorithm can be further enhanced. I can's foresee any algorithm tweaks we 
can't code ourselves that we'd need LC or USE-LIST assistance for.Back around 
LiveCode 7, Fraiser said, in response to some correspondence I had with him, 
that he would consider creating a "guessEncoding" to go along with the Unicode 
Everywhere work and the new textEncode/textDecode functions. I do understand 
the reluctance, as a business, to do so, as inevitably there will be some 
instances where it guesses wrong. Other than LC adding a guessEncoding function 
using some open source library, I would say the area where LC could be the most 
help would be with this enhancement 
https://quality.livecode.com/show_bug.cgi?id=22391I am under the, perhaps 
false, impression that isoToMac and macToIso are sort of viewed as functions 
that may become deprecated and no longer updated in the future. However, they 
are still essential for us until I can textDecode(someData,"MacRoman") on a 
Windows system and vice 
versa._______________________________________________use-livecode mailing 
listuse-livecode@lists.runrev.comPlease visit this url to subscribe, 
unsubscribe and manage your subscription 
preferences:http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to