In perl.git, the branch smoke-me/khw-locale has been created <https://perl5.git.perl.org/perl.git/commitdiff/8eb5fb31fe57da1a4b503fdb7ec8d34a2b969c38?hp=0000000000000000000000000000000000000000>
at 8eb5fb31fe57da1a4b503fdb7ec8d34a2b969c38 (commit) - Log ----------------------------------------------------------------- commit 8eb5fb31fe57da1a4b503fdb7ec8d34a2b969c38 Author: Karl Williamson <k...@cpan.org> Date: Thu Jan 18 10:26:43 2018 -0700 f commit bc1bbfa1782da50b9f899a741cd1d72effc19dad Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 17:01:00 2018 -0700 my_atof(): Lock dot radix This commit shows some redundant checks. It examines the text and if it finds a dot in the middle of the number, and the locale is expecting something else, it toggles LC_NUMERIC to be the C locale so that the dot is understood. However, during further parsing, grok_numeric_radix() gets called and sees that the locale shouldn't be C, and toggles it back. That ordinarily would cause the dot to not be recognized, but this function always recognizes a dot no matter what the locale. So none of our tests fails. I'm not sure if this is always the case, and I don't understand this area of the code all that well, but there is a simple way to cause grok_numeric_radix to not change the locale back, and that is to call the macro LOCK_LC_NUMERIC_STANDARD() when changing it the first time in my_atof(). The purpose of this macro is precisely this situation, so that recursed calls don't try to override the decisions of the outer calls commit a1e91ebadb19a1489cc9c9cff4b5f3af7d5f06c7 Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 15:20:44 2018 -0700 Latch LC_NUMERIC during critical sections It is possible for operations on threaded perls which don't 'use locale' to still change the locale. This happens when calling POSIX::localeconv() and I18N::Langinfo(), and in earlier perls, it can happen for other operations when perl has been initialized with the environment causing the various locale categories to not have a uniform locale. This commit causes the areas where the locale for this category should predictably be in one or the other state to be a critical section where another thread can't interrupt and change it. This is a separate mutex, so that only these particular operations will be held up. commit 1ed5e890da2c5945b698aec1a87b032676962367 Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 13:54:27 2018 -0700 locale.c: Do savepv() ASAP When this code is called on a threaded perl, it's possible that another thread could zap the setlocale return buffer, if it's not reentrant. I suspect we would have seen this more often if that was the case, but this commit improves things by doing the save immediately, reducing the unsafe interval. commit b7fd31f00e5eb55259137a76c0a7701d3c7981f0 Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 13:47:17 2018 -0700 locale.c: #ifdef'd out code for making thread safe on not equipped platforms commit 1f0bf6fe539515d315a628b8ae42d660dd644faa Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 12:55:13 2018 -0700 Add mutex for changing LC_NUMERIC But don't use it yet. Changing of LC_NUMERIC is done by the perl core, and is a potential race condition on threaded perls. This adds a mutex that later commits will use to create critical sections where the value of LC_NUMERIC matters. commit 1bcad8374e6a13b0fa1bd4a9033c98fd9ac566c0 Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 13:32:32 2018 -0700 POSIX::localconv(): Prefer localeconv_l() This is a thread-safe version of localeconv(), so use it under threads. commit af230fefbc65f0306d964318133a43ac8e3c0c16 Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 12:40:40 2018 -0700 POSIX::localeconv() Use new fcn; avoid recalcs This calls strlen() once, instead of passing 0 to the the subsidiary functions which causes them to call it each time. It also uses the new function is_utf8_non_invariant_string() instead of doing here what that function does. commit 2697bd076e5e718e291987db4a6aeb2242711000 Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 11:22:02 2018 -0700 XXX pod change so XS code doesn't call localeconv directly POSIX.xs: Add mutex around localeconv() If another thread calls localeconv(), it can destroy the returned buffer. This adds a mutex around this call; the only other place in the core that calls it already has this mutex, so they now are thread-safe. commit 1ddcef511311d96ea0e389dafd3ee89ea4cb3144 Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 11:16:15 2018 -0700 POSIX.xs: White space only Vertically align for readability commit b5369162223ab878befbb1a73f458f94452b96ea Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 11:07:42 2018 -0700 locale.c: Move some mutex ops A future commit will add a mutex, and create the convention that this mutex if used in combination with the new one always be tried after the new one is in effect, in order to prevent the possibility of deadlock. Do it now, before the new one gets added. This also adds some comments about the reason for this mutex. commit a350f3f516bb3b7a9491d5b4f3a3203b5e928646 Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 10:12:30 2018 -0700 locale.c: White-space only Indent code to account for the previous commit adding two blocks commit a8839f1da81beab4b099d90c9e22be719876dcb3 Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 08:27:04 2018 -0700 locale.c: Use macro instead of its expansion This macro in a future commit will become more complex. commit b1321dea5fd9b0ab56d42cf69dd683bff0b8571b Author: Karl Williamson <k...@cpan.org> Date: Tue Jan 16 22:09:27 2018 -0700 locale.c: Do common task in one place This function in some cases may need to temporarily switch the LC_NUMERIC code. Instead of repeating the logic to determine if this is needed, do it once. commit f2c553c1098043b9058eeaa8613af3e7046f81e6 Author: Karl Williamson <k...@cpan.org> Date: Tue Jan 16 18:47:16 2018 -0700 More debug commit c6e89e231cf70444c7f650ed7a314fe0ce30b53a Author: Karl Williamson <k...@cpan.org> Date: Tue Jan 16 17:38:45 2018 -0700 POSIX.xs: Keep locale change to minimum span Move the restore to as close to the save as possible so that the locale is in an unstable state for as short a time as possible. commit 1aac846b2ea8e220ebe478991a9edff66ce20007 Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 13:24:46 2018 -0700 POSIX::strftime: Add better fallback about UTF-8 If the function returns a valid string that isn't completely UTF-8 invariant, the function assumes it is UTF-8 if we are in a UTF-8 locale. This works, but in the unlikely event that the system has no LC_TIME, we can't tesll if it is in a UTF-8 locale. As a better fallback position, this commit adds the check that there is just a single script of the time string, adding a measure of reassurance that out call that it is UTF-8 is correct. This is unlikely to be used, but now that there is a function to call that determines if this is a script run, it's easy to add, and unlikely to actually get compiled. commit eeedeed9bc8476e863a2b041e1c74212cbf2696f Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 13:18:50 2018 -0700 grok_numeric_radix(): Avoid recalculating This function just determined that we are in the scope of 'use locale', hence the underlying radix character should be used. This commit changes to use the macro that directly does that; previously the macro that redundantly looks at if we are in the scope was used. commit 4e0473f12a2b221431f5a87d92eca61bb04515fc Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 17 13:00:44 2018 -0700 sv_vcatpvfn_flags() Balance LC_NUMERIC changes/restores Prior to this commit, the restore for LC_NUMERIC was getting called even if there were no corresponding store. Change so they are balanced; a future commit will require this. commit 906bec9c1e4909854d5ccd9398389bd0f2cc31db Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 15 16:39:44 2018 -0700 perl.h: Remove some obsolete macros These no longer make sense; were for core internal use only commit 1f61ccf15d9fa84fe8b8dd3eece5e4cf0d3517ff Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 15 15:56:43 2018 -0700 vutil.c: White_space only Properly indent a block, and add spaces where C11++ deprecates not having them commit d25609a92305ef83d0d08cacc748d31058045863 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 15 15:48:57 2018 -0700 Simplify some LC_NUMERIC macros These macros are marked as subject to change and are not documented externally. I don't know what I was thinking when I named some of them, but whatever no longer makes sense to me. Simplify them, and change so there is only one restore macro to remember. commit c80ec31c2e4597974fe171d6edf58314d7d01648 Author: Karl Williamson <k...@cpan.org> Date: Sun Jan 14 22:21:31 2018 -0700 for debug Carlos commit 8ff9861eec9760401e9d1b3e77ec5f3c67436fad Author: Karl Williamson <k...@cpan.org> Date: Sun Jan 14 21:43:43 2018 -0700 toke.c: Remove unnecessary macro calls These macros were to shift the LC_NUMERIC state into using a dot for the radix character. When I wrote this code, I assumed that parsing should be using just the dot. Since then, I have discovered that this wraps other uses where the dot is not correct, so remove it. commit 0eb4664d43b630d059bbd795dc06dec0f6f9734b Author: Karl Williamson <k...@cpan.org> Date: Sun Jan 14 21:37:16 2018 -0700 perl.h: Remove unused locale core macro This undocumented macro is unused in the core, and all these are commented that they are subject to change. And it confuses things, so just remove it. commit 03e682f85792fa9114e1b80eba5f6dbee80521af Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 10 22:35:12 2018 -0700 POSIX.xs: Prefer mbrtowc() over mbtowc() mbrtowc is reentrant, so use it on threaded perls if available when POSIX::mbtowc() is called. commit 2e13541db63472613e26e17f735026797bc3b41b Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 10 22:28:34 2018 -0700 POSIX.xs: Prefer mbrlen() over mblen() mbrlen is reentrant, so use it on threaded perls if available when POSIX::mblen() is called. commit 8b4104f11f540c3619c28d00b9c10344a3f1918d Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 8 18:21:12 2018 -0700 locale.c: Revamp fallback detection of UTF-8 locales This commit continues the process started in the previous few commits to improve the detection of whether a locale is UTF-8 or not when the platform doesn't have the more modern tools available. What was done before was examine various texts, like the days of the week, in a locale, and see if they are legal UTF-8 or not. If there were any, and all were legal, it assumed that UTF-8 was needed. If there weren't any (as in American English), it looked at the locale's name. This presents false negatives and false positives. Basically, it adds the constraint that all the texts need to be in the same script when interpreted as UTF-8, which basically rules out any false positives when the script isn't Latin. With Latin, it isn't so clear cut, as the text can be intermixed with ASCII Latin letters and UTF-8 variant sequences that could be some Latin locale, or UTF-8, and they just coincidentally happen to be syntactically UTF-8. Because of the structuredness of UTF-8, the odds of a coincidence go down with increasing numbers of variants in a row. This also isn't likely to happen with ISO 8859-1, as the bytes that could be legal continuations in UTF-8 are almost entirely controls or punctuation. But in other locales in the 8859 series, there are some legal continuations that could be part of a month name, say. As an example of the issues, in 8859-2, one could have \xC6 (C with acute) followed by \xB1 (a with ogonek), which in UTF-8 would be U+01B1: LATIN CAPITAL LETTER UPSILON. However, something like \xCD (i acute) followed by \xB3 (l with stroke) yields U+0373: GREEK SMALL LETTER ARCHAIC SAMPI, and the script check added by this commit would catch that. In non-Latin texts, the only permissible ASCII characters would be punctuation, and you aren't going to have many of those in the LC_TIME strings, and certainly not in a row. Instead those will consist of at least several variant characters in a row, and the odds of those coincidentally being syntactically valid UTF-8 and semantically in the same script are exceedingly low. To catch Latin UTF-8 locales, this commit adds a list of the distinct variants found so far. If there are even just several of these, the odds of the syntax being coincidentally UTF-8 greatly diminish. The number needed for this to conclude that the locale is UTF-8, is easily tweakable at compile time. The problem remains for English and other Latin script languages that have rare accented characters. The name is still then examined for containing "UTF-8". Note that previous commits have guaranteed that if the locale has a non-ASCII currency symbol that is recognized by Unicode, such as the Euro or Pound Sterling, that will correctly be recognized. commit 8983edc9f9f66e6d9ef463deae59c8ceeb8715c4 Author: Karl Williamson <k...@cpan.org> Date: Sun Jan 7 16:22:27 2018 -0700 locale.c: Improved fallback UTF-8 locale detection This adds some more checks for when the platform lacks mbtowc(). We can check if things like isprint(), toupper() match what a UTF-8 locale would do. If not, we can rule out UTF-8. commit df7d49db805efc6cacf38ea68c42fba5648c37a8 Author: Karl Williamson <k...@cpan.org> Date: Sat Jan 6 16:00:02 2018 -0700 Improve fallback UTF-8 locale detection If the libc doesn't have modern enough routines, we use a fallback mechanism to see if a locale is UTF-8 or not. One component of this is to look at the byte sequence for the currency symbol. Obviously, if the sequence isn't valid UTF-8, the locale isn't either. But if it is valid UTF-8, and hence might be a UTF-8 locale, this commit changes the detection mechanism to see if the sequence evaluates, when interpreted as UTF-8 to be a known Unicode currency symbol. If so, the locale must be UTF-8, as the odds of some other locale having a sequence that does this are vanishingly small. If the sequence doesn't evaluate to a currency symbol, that doesn't tell us anything, as plenty of places have a string of letters be their currency symbol. Nor if the symbol is a '$', as that is invariant under UTF-8 vs not, so doesn't help us. This pretty much guarantees that a UTF-8 locale for the European Union or the UK that otherwise looks like plain English (Latin script) will be properly determined to be UTF-8, as the symbols for their currencies will pass this test. commit 934bfae7516a4409180b2fd844e492dd726c265a Author: Karl Williamson <k...@cpan.org> Date: Sat Jan 6 14:24:30 2018 -0700 locale.c: Avoid localeconv() my_langinfo() is a recently added function which presents a better API than localeconv, and returns the needed information here, and is easier to make thread-safe. commit bf167ecf102be2b3c218544c2cf91105683da4e4 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 8 17:37:15 2018 -0700 locale.c: White-space only This indents all this code, with no other changes, in preparation for a future commit which will add a block around it. commit 404ccb7c287dd0cdec7ede45af9778ca8d2e40d9 Author: Karl Williamson <k...@cpan.org> Date: Sun Jan 7 15:58:52 2018 -0700 locale.c: Remove branch to label The code at this label was branched to because it contained common cleanup code. But now that code is in a function, so the cleanup call is trivial, so just skip this intermediate label. commit dbcbc06db7ddb602e68dd6fb1176b5db405b10a9 Author: Karl Williamson <k...@cpan.org> Date: Sat Jan 6 12:42:35 2018 -0700 locale.c: Extract duplicated code into subroutines These two paradigms are each repeated in 4 places. Make into two subroutines commit 2a7181469bc701c2e2cb49e405ccfccd69d55d35 Author: Karl Williamson <k...@cpan.org> Date: Fri Jan 5 21:41:27 2018 -0700 locale.c: Prefer mbrtowc(), as its reentrant If it's available and this is a threaded build, it's preferred. commit 8b6ff966773b6545a712124a6a55111bf8bc3805 Author: Karl Williamson <k...@cpan.org> Date: Sun Jan 7 15:43:01 2018 -0700 locale.c: White-space only Indent to correspond with new block from previous commit commit de0718c659d8f82ec116c7731b0d3f964a0bf070 Author: Karl Williamson <k...@cpan.org> Date: Fri Jan 5 14:09:40 2018 -0700 locale.c: Revamp finding if locale is UTF-8 This changes how this functionality works for the LC_CTYPE locale. On systems that have nl_langinfo() one can get a definitive answer from just that. Otherwise (or if that doesn't return properly) one can use mbtowc() to check if the UTF-8 byte sequence for the Unicode REPLACEMENT CHARACTER actually is considered to be that code point. This is also definitive. If the maximum byte string length for a character is too short to handle all Unicode UTF-8, we know without further checking that this isn't a UTF-8 locale, so can avoid the mbtowc check. commit ee892f1ddb1d370ef14f8fc8a762f781bbeeb868 Author: Karl Williamson <k...@cpan.org> Date: Sun Jan 7 15:30:06 2018 -0700 locale.c: Windows will never be EBCDIC This adjusts the conditional compilation so that win32 is a subset of non-EBCDIC. This will be useful in the next commit. commit 5cea40e896c533a2c61eaf8488390339219aca27 Author: Karl Williamson <k...@cpan.org> Date: Fri Jan 5 12:57:37 2018 -0700 locale.c: Simplify expression Since this is operating on C strings, we don't have to check the lengths, but can rely on the underlying functions to work. commit 14872e5033da01dc4a27227cd28eb100a8e323b7 Author: Karl Williamson <k...@cpan.org> Date: Fri Jan 5 11:35:00 2018 -0700 Change some "shouldn't happen" failures into panics If the system is so broken that these libc calls are failing, soldiering on won't lead to sane results. THis rewords some existing panics, and adds the errno to the output for all of them. commit ce5b3ef666bf84c7b50846efc5b2b2ea3576269c Author: Karl Williamson <k...@cpan.org> Date: Tue Jan 2 16:54:28 2018 -0700 Cache locale UTF8-ness lookups Some locales are UTF-8, some are not. Knowledge of this is needed in various circumstances. This commit saves the results of the last several lookups so they don't have to be recalculated each time. The full generality of POSIX locales is such that you can have error messages be displayed in one locale, say Spanish, while other things are in French. To accommodate this generality, the program can loop through all the locale categories finding the UTF8ness of the locale it points to. However, in almost all instances, people are going to be in either French or in Spanish, and not in some combination. Suppose it is a French UTF-8 locale for all categories. This new cache will know that the French locale is UTF-8, and the queries for all but the first category can return that immediately. This simple cache avoids the overhead of hashes. This also fixes a bug I realized exists in threaded perls, but haven't reproduced. We do not support locales in such perls, and the user must not change the locale or 'use locale'. But perl itself could change the locale behind the scenes, leading to segfaults or incorrect results. One such instance is the determination of UTF8ness. But this only could happen if the full generality of locales is used so that the categories are not all in the same locale. This could only happen (if the user doesn't change locales) if the environment is such that the perl program is started up so that the categories are in such a state. This commit fixes this potential bug by caching the UTF8ness of each category at startup, before any threads are instantiated, and so checking for it later just looks it up in the cache, without perl changing the locale. commit fb713cc7f5ea86bac6c84fe5f5d15a424290200d Author: Karl Williamson <k...@cpan.org> Date: Tue Jan 2 14:23:24 2018 -0700 locale.c: Avoid duplicate work As the comments say, the needed value is already readily available commit f3a48a072a7adc10a3d7d23f5b7f492823eee7d9 Author: Karl Williamson <k...@cpan.org> Date: Tue Jan 2 13:38:16 2018 -0700 locale.c: Avoid some work We've already worked out whether the decimal point is a dot or not. We can pass that information to the called routine so it doesn't have to figure it out again. commit f70ec8b986da969bc92357c1e27442035fe8abb6 Author: Karl Williamson <k...@cpan.org> Date: Tue Jan 2 13:19:03 2018 -0700 locale.c: Use non-control for a format dummy We need a plain character here. I used a '\e' before, but it would be better to have something that isn't a control, so just change it to a blank commit ab5bcd4ec632fd25fdf4b4421edc017b10254c25 Author: Karl Williamson <k...@cpan.org> Date: Tue Jan 2 12:25:35 2018 -0700 locale.c: Avoid some more locale changes In a few places here we can test if we are already in the locale we want to be in, and not switch unnecessarily if so. commit 6c87e30bdef9872b7af2ed95c45a452f1007a6ac Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 1 23:03:34 2018 -0700 Avoid some unnecessary changing of locales The LC_NUMERIC locale category is kept so that generally the decimal point (radix) is a dot. For some (mostly) output purposes, it needs to be swapped into the program's current underlying locale so that a non-dot can be printed. This commit changes things so that if the current underlying locale uses a decimal point, the swap doesn't happen, as it's not needed. commit ce2f99f71ec4d0bf8270699ede5a19162c18e9f9 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 1 22:20:25 2018 -0700 perl.h: White-space only commit 1dd19ee43a98f066952cb4aa0e06c67eeebb1171 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 1 20:41:21 2018 -0700 locale.c: Add compile check for unimplemented behavior Instead of silently not working. commit 003c779c55bb1e9dc77a7b12b649c32638d50c28 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 1 20:30:39 2018 -0700 locale.c: White-space only Indent because the previous commit created an enclosing block, and add a blank line elsewhere commit 84951d40820b6b62a4e53155c19c531b203ae686 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 1 20:00:03 2018 -0700 locale.c: Refactor Ultrix code Examination shows that this code does nothing unless LC_ALL is defined. So explicitly test at compile time for that. Also, two variables don't have to be declared so globally, and by reducing their scope, by creating a new block we don't have to have PERL_UNUSED_ARG()s for them commit 4b27bb81ef1b023c608796caaa0c51a74c162d71 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 1 19:07:19 2018 -0700 locale.c: Avoid rescanning a string We can use a parameter to find out where in the string the portion of interest starts. Do that to avoid starting again from scratch. commit 97382419148b1f4d668b6b900cc668e874f31ba1 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 1 18:33:59 2018 -0700 locale.c: Use fcns instead of macros Here the macros being used expand into the functions being called, without adding any value to using the macros, and making things slightly less clear. commit c9264a99bde131ba559fccaafaeff3af48db18f1 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 1 18:17:41 2018 -0700 locale.c: Add const to several variables commit 1a3bdf0bc0bd683f7f19105ecdd19c6f35bf69fd Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 1 18:15:27 2018 -0700 locale.c: Improve, add comments commit 031262f0df843e034817eac1fe1a0444e0b504e9 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 1 18:01:45 2018 -0700 perl.h: Add comment, rephrase another commit ea78fd5f8570bd013b27c2b9a9efda7ccbef7642 Author: Karl Williamson <k...@cpan.org> Date: Sat Nov 18 17:34:25 2017 -0700 Perl_langinfo: Teach about YESSTR and NOSTR These are items that nl_langinfo() used to be required to return, but are considered obsolete. Nonetheless, this drop-in replacement for that function should know about them for backward compatibility. commit fd6bf70d947e8a0facc5ec2fcc8a4e874be6986e Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 1 15:07:45 2018 -0700 APItest/t/locale.t: Add some tests This makes sure that the entries for which the expected return value may legitimately vary from platform to platform get tested as returning something, skipping the test if the item isn't known on the platform. A couple of comments are also added. commit a049c10c147ac6058d60538dab96805fcb4582d5 Author: Karl Williamson <k...@cpan.org> Date: Mon Aug 28 18:01:43 2017 -0600 XXX may include other things after final edits: ExtUtils::ParseXS/lib/perlxs.pod: Nits This removes extra blanks following colons that don't mean the normal thing for colons that traditionally have two spaces after them, and capitalizes Perl. commit 614fe02873d432b231aa712b6339db2a5cb57248 Author: Karl Williamson <k...@cpan.org> Date: Wed Jul 26 08:59:33 2017 -0600 Teach perl about more locale categories glibc has various other categories than the ones perl handles, for example LC_PAPER. This commit adds knowledge of these to perl, so that one can set them, interrogate them, and have libraries work on them, even though perl itself does not. This is in preparation for future commits, where it becomes more important than currently for perl to know about all the locale categories on the system. I looked through various other systems to try to find other categories, but did not see any. If a system does have such a category, it is pretty easy to tell perl about it, and recompile. Use the changes in this commit as a template, and send an email to perl...@perl.org, so that the next Perl release will have it. commit f52b4f1d351b1ded0290a3c0c45104dfdd7dfae6 Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 3 20:41:29 2018 -0700 Add check that "$!" is correctly interpreted as UTF-8 We sometimes need to know if an error message is UTF-8 or not. Previously we checked that it is syntactically valid UTF-8, and that the LC_MESSAGES locale is UTF-8. But some systems, notably Windows, do not have LC_MESSAGES. For those, this commit adds a different, semantic, check that the text of the message when interpreted as UTF-8 is all in the same Unicode script. This is not foolproof, unlike the LC_MESSAGES check, but it's better than what we have now for such systems. It likely is foolproof for non-Latin locales, as any message will have a bunch of characters in that locale, and no ASCII Latin ones. For a Latin locale, these ASCII letters could be intermixed with the UTF-8 ones, causing potential ambiguity. commit e7d4435f0da5eece0ae9c09192c501bb85a70b1d Author: Karl Williamson <k...@cpan.org> Date: Tue Nov 14 22:27:06 2017 -0700 Remove uncompilable code This code was never compiled because of a misspelling in the #ifdef. No problem surfaced, so just remove it. The next commit adds a different check. commit 18467424e5b87440e67d1e881d161da68d733bf8 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 8 19:11:52 2018 -0700 XXX rethink empty script_run commit 954f19a5cfe7c89d281d390b437f1ca5a4aa6165 Author: Karl Williamson <k...@cpan.org> Date: Mon Jan 8 19:08:54 2018 -0700 perl.c: Move initialization of inversion lists This is now done very early in the file, as it may be needed for initializing the locale handling. commit 4f9614f40cc9be3a51d071493d00c8b4edc8acb1 Author: Karl Williamson <k...@cpan.org> Date: Sat Jan 6 21:16:15 2018 -0700 Give isSCRIPT_RUN() an extra parameter This allows it to return the script of the run. commit 0fbc31232552952bcad4ef1223c2d50fad22bebe Author: Karl Williamson <k...@cpan.org> Date: Sat Jan 6 16:15:12 2018 -0700 charclasslists.h: script enums visible to CORE,EXT This exposes the enum definitions for the script extensions property to the perl code and extensions, for use in future commits. commit 0d1902c5bbe71102d409d03ad1d30ead31f9588b Author: Karl Williamson <k...@cpan.org> Date: Sat Jan 6 16:13:06 2018 -0700 regen/mk_invlists.pl: Allow override of where enums get defined This adds code so that the enums defined by this, which are ordinarily only used by regexec.c ban be specified to be somewhere else instead. commit a71ebf27502b486d2f0f24c22ce7e0acd23881bd Author: Karl Williamson <k...@cpan.org> Date: Sat Jan 6 16:09:57 2018 -0700 regen/mk_invlists.pl: Allow multiple files to access This changes the code so that the symbols defined by this program can be #define'd in more than one file. commit 5b6e695a6f72fc77987a965e1de00cdcd136f5b7 Author: Karl Williamson <k...@cpan.org> Date: Sat Jan 6 16:18:45 2018 -0700 Fix bug in script runs that start with Common This is a follow on to 8535a06fea02528fe726855a139fcbd360d1fc6e. That fixed one case where the first character was in the Common script, things did not work properly. It did not catch the case where a future character in the string was non-Common from a script that has its own set of digits, and this commit fixes that. This just entails a block of code to slightly earlier. commit cff386cf210927fb111afa5dcadb7e6726cb482e Author: Karl Williamson <k...@cpan.org> Date: Wed Jan 10 17:10:09 2018 -0700 Make sure variable is always defined A future commit assumes this variable is there even on non-DEBUGGING builds. #define it to 0 for those. ----------------------------------------------------------------------- -- Perl5 Master Repository