Branch: refs/heads/blead Home: https://github.com/Perl/perl5 Commit: e8f0fec863d7b588d1fe0d52eabc7822542ed0ff https://github.com/Perl/perl5/commit/e8f0fec863d7b588d1fe0d52eabc7822542ed0ff Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019)
Changed paths: M op.c Log Message: ----------- op.c: Indent some code This is in preparation for a future commit which will surround this with an 'if'. Commit: 2f96a1b45550620873a2dd338a4ad004910ce4f8 https://github.com/Perl/perl5/commit/2f96a1b45550620873a2dd338a4ad004910ce4f8 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M doop.c M op.c Log Message: ----------- doop.c, op.c: White-space only Remove trailing blanks and outdent a doubly indented block Commit: 0358acfc73b23d10ba01096f92cd32b3d85d1a12 https://github.com/Perl/perl5/commit/0358acfc73b23d10ba01096f92cd32b3d85d1a12 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M op.c Log Message: ----------- op.c: Comments only Indent for clarity, and add a comment Commit: dc8faf6bad9ccc8bd6101ba9aa256e7146845799 https://github.com/Perl/perl5/commit/dc8faf6bad9ccc8bd6101ba9aa256e7146845799 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M op.c M op.h M toke.c Log Message: ----------- Change macro name in tr/// code This makes it more mnemonic. Also add an explanation in toke.c Commit: f534d5462959256391e14ee587e98cbc036c9e4a https://github.com/Perl/perl5/commit/f534d5462959256391e14ee587e98cbc036c9e4a Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M doop.c M embed.fnc M embed.h M proto.h Log Message: ----------- doop.c: Add a parameter to a few fcns instead of deriving it each time from inside the function. This is in preparation for future commits. Commit: 482bf6150109082d5b3f93aca376c4ca258a597c https://github.com/Perl/perl5/commit/482bf6150109082d5b3f93aca376c4ca258a597c Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M doop.c M op.c M op.h Log Message: ----------- op.c, doop.c Use mnemonics instead of numeric values For legibility and maintainability Commit: b37fc6e8df40e52106f2828c60e7e12bb707fa13 https://github.com/Perl/perl5/commit/b37fc6e8df40e52106f2828c60e7e12bb707fa13 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M ebcdic_tables.h M regen/ebcdic.pl Log Message: ----------- regen/ebcdic.pl: Add tables that partition by UTF-8 length These will be used in a future commit. This creates equivalence classes of ranges of code points whose UTF-8 representations are the same length Commit: 1132699c20405cf61bb0094610246144a29ceee7 https://github.com/Perl/perl5/commit/1132699c20405cf61bb0094610246144a29ceee7 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M op.c Log Message: ----------- op.c: Simplify expression. This also makes sure 'struct_size' has the correct value in it for any future uses. Commit: b8e8e0fc7ce2b44e6b1ba9f6ea70ac62f044c284 https://github.com/Perl/perl5/commit/b8e8e0fc7ce2b44e6b1ba9f6ea70ac62f044c284 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M doop.c Log Message: ----------- doop.c: Add, revise comments Commit: 472799915f21962871c7c62bd6b48f9e6775e0b7 https://github.com/Perl/perl5/commit/472799915f21962871c7c62bd6b48f9e6775e0b7 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M doop.c M op.h Log Message: ----------- doop.c: Change out-of-bounds value This currently uses 0xfeedface as a marker for something that isn't a legal value. But that could in fact become legal at same point. This defines a value TR_OOB that can be guaranteed not to become legal. Commit: 5e874c42f3fcb001bc30d9d1a2618581b52a412e https://github.com/Perl/perl5/commit/5e874c42f3fcb001bc30d9d1a2618581b52a412e Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M doop.c Log Message: ----------- doop.c: Change name of variable This helped me understand what was going on in this function Commit: 79f0ed310ab9459639c1dfac0c0c276669ad500d https://github.com/Perl/perl5/commit/79f0ed310ab9459639c1dfac0c0c276669ad500d Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M t/op/tr.t Log Message: ----------- t/op/tr.t: Add tests, incl. a TODO This adds a TODO test which demonstrates that the current tr/// is broken, to be fixed by the next commit. It adds other tests designed to stress the forthcoming revisions in the implementation of tr///. Commit: 00bd451dcd3aeb82c9155d46e0f39afac4e75a7a https://github.com/Perl/perl5/commit/00bd451dcd3aeb82c9155d46e0f39afac4e75a7a Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M doop.c M t/op/tr.t Log Message: ----------- doop.c: Refactor do_trans_complex() I had trouble understanding how this uncommented routine worked. And it turned out to be broken, squeezing the pre-transliterated characters instead of the post-transliterated ones. This fixes the TODO test added in the previous commit. Commit: 6507ac8375250fed048a612594c6d6d4f5ac2613 https://github.com/Perl/perl5/commit/6507ac8375250fed048a612594c6d6d4f5ac2613 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M lib/B/Op_private.pm M op.h M opcode.h M regen/op_private Log Message: ----------- Change names of some OPpTRANS flags These two flags will shortly become obsolete, replaced by ones with different meanings. This flag makes the new ones the normal ones, and makes the old names synonyms so that code that refers to them can compile. Commit: 84ac8fac229faf9c2e1499494772e5cafed92229 https://github.com/Perl/perl5/commit/84ac8fac229faf9c2e1499494772e5cafed92229 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M op.h Log Message: ----------- op.h: Add synonyms for some tr/// values Commit: 58a0d047aa9b5d14eab60e85a550efa918a92018 https://github.com/Perl/perl5/commit/58a0d047aa9b5d14eab60e85a550efa918a92018 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M embed.fnc M embed.h M invlist_inline.h M op.c M proto.h Log Message: ----------- op.c: Add debugging dump function This function dumps out an inversion map Commit: 8c90d3a9c79a9471ef12dde584263fc38571cf46 https://github.com/Perl/perl5/commit/8c90d3a9c79a9471ef12dde584263fc38571cf46 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M embedvar.h M intrpvar.h M perl.c Log Message: ----------- intrpvar.h: Add variable for use in tr/// This is part of this branch of changes. Commit: f34acfecc286f2eff2450db713da005d888a7317 https://github.com/Perl/perl5/commit/f34acfecc286f2eff2450db713da005d888a7317 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M doop.c M dump.c M embed.fnc M embed.h M invlist_inline.h M lib/B/Deparse.pm M op.c M op.h M proto.h M toke.c Log Message: ----------- Reimplement tr/// without swashes This large commit removes the last use of swashes from core. It replaces swashes by inversion maps. This data structure is already in use for some Unicode properties, such as case changing. The inversion map data structure leads to straight forward implementation code, so I collapsed the two doop.c routines do_trans_complex_utf8() and do_trans_simple_utf8() into one. A few conditionals could be avoided in the loop if this function were split so that one version didn't have to test for, e.g., squashing, but I suspect these are in the noise in the loop, which has to deal with UTF-8 conversions. This should be faster than the previous implementation anyway. I measured the differences some releases back, and inversion maps were faster than the equivalent swash for up to 512 or 1024 different ranges. These numbers are unlikely to be exceeded in tr/// except possibly in machine-generated ones. Inversion maps are capable of handling both UTF-8 and non-UTF-8 cases, but I left in the existing non-UTF-8 implementation, which uses tables, because I suspect it is faster. This means that there is extra code, purely for runtime performance. An inversion map is always created from the input, and then if the table implementation is to be used, the table is easily derived from the map. Prior to this commit, the table implementation was used in certain edge cases involving code points above 255. Those cases are now handled by the inversion map implementation, because it would have taken extra code to detect them, and I didn't think it was worth it. That could be changed if I am wrong. Creating an inversion map for all inputs essentially normalizes them, and then the same logic is usable for all. This fixes some false negatives in the previous implementation. It also allows for detecting if the actual transliteration can be done in place. Previously, the code mostly punted on that detection for the UTF-8 case. This also allows for accurate counting of the lengths of the two sides, fixing some longstanding TODO warning tests. A new flag is created, OPpTRANS_CAN_FORCE_UTF8, when the tr/// has a below 256 character resolving to one that requires UTF-8. If this isn't set, the code knows that a non-UTF-8 input won't become UTF-8 in the process, and so can take short cuts. The bit representing this flag is the same as OPpTRANS_FROM_UTF, which is no longer used. That name is left in so that the dozen-ish modules in cpan that refer to it can still compile. AFAICT none of them actually use the flag, as well they shouldn't since it is private to the core. Inversion maps are ideally suited for tr/// implementations. An issue with them in general is that for some pathological data, they can become fragmented requiring more space than you would expect, to represent the underlying data. However, the typical tr/// would not have this issue, requiring only very short inversion maps to represent; in some cases shorter than the table implementation. Inversion maps are also easier to deparse than swashes. A deparse TODO was also fixed by this commit, and the code to deparse UTF-8 inputs is simplified. One could implement specialized data structures for specific types of inputs. For example, a common tr/// form is a single range, like tr/A-Z/a-z/. That could be implemented without a table and be quite fast. An intermediate step would be to use the inversion map implementation always when the transliteration is a single range, and then special case length=1 maps at execution time. Thanks to Nicholas Rochemagne for his help on B Commit: 3c4ee8bc4711cf6c1650c141dc86a9931d5634d9 https://github.com/Perl/perl5/commit/3c4ee8bc4711cf6c1650c141dc86a9931d5634d9 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M lib/B/Deparse.t M t/lib/warnings/op Log Message: ----------- UnTODO some tests fixed by the previous commit Commit: bf42a69327483603a53945ca159333221e0368b2 https://github.com/Perl/perl5/commit/bf42a69327483603a53945ca159333221e0368b2 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M Porting/todo.pod Log Message: ----------- Porting/todo.pod: Rmv reference to fixing swashes Commit: 2366ba44ab34fe60aa193eeae1be4330368dcb49 https://github.com/Perl/perl5/commit/2366ba44ab34fe60aa193eeae1be4330368dcb49 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M handy.h Log Message: ----------- handy.h: Change references to swashes As these are no longer used. Commit: bf6e464fb44457ed714ead27ebf49a5a627a99ae https://github.com/Perl/perl5/commit/bf6e464fb44457ed714ead27ebf49a5a627a99ae Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M op.c Log Message: ----------- op.c: Remove no-longer used function Commit: 58aa673865ea7e8aaee4b242ff4885eac1ee1334 https://github.com/Perl/perl5/commit/58aa673865ea7e8aaee4b242ff4885eac1ee1334 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M MANIFEST M charclass_invlists.h M doop.c M embed.fnc M embed.h M embedvar.h M intrpvar.h M lib/unicore/mktables M lib/unicore/uni_keywords.pl M lib/utf8_heavy.pl M pod/perldiag.pod M proto.h M regcharclass.h M sv.c R t/uni/cache.t M toke.c M uni_keywords.h M utf8.c Log Message: ----------- Remove swashes from core Also references to the term. Commit: 483a80b4eb1ce75c33945f69455138be14944460 https://github.com/Perl/perl5/commit/483a80b4eb1ce75c33945f69455138be14944460 Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M MANIFEST M Porting/Maintainers.pl M charclass_invlists.h M lib/Unicode/UCD.pm M lib/Unicode/UCD.t M lib/unicore/mktables M lib/unicore/uni_keywords.pl R lib/utf8_heavy.pl M regcharclass.h M regcomp.c M t/re/pat.t M uni_keywords.h Log Message: ----------- Remove utf8_heavy.pl The only remaining user of this is Unicode::UCD, and so most of the code from utf8_heavy.pl is moved into that UCD.pm. It removes a no-longer relevant test (that had been changed into a skip anyway), and it changes or removes the no-longer relevant references in comments to utf8_heavy.pl Later commits will do some simplification as not all the previous functionality is needed. This commit removed only the parts that were preventing compilation and tests passing. Commit: 29791e7dd4d3e5de7d243a10a71109b4ef20189d https://github.com/Perl/perl5/commit/29791e7dd4d3e5de7d243a10a71109b4ef20189d Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M lib/Unicode/UCD.pm Log Message: ----------- UCD.pm: Remove 'none' from swash This was only used by tr///, and hence is no longer relevant. I never really understood it. Commit: 048bdb720dd091aa62a709b37e2e074164fd7cdc https://github.com/Perl/perl5/commit/048bdb720dd091aa62a709b37e2e074164fd7cdc Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M .gitignore M charclass_invlists.h M lib/Unicode/UCD.pm M lib/Unicode/UCD.t M lib/unicore/mktables M lib/unicore/uni_keywords.pl M regcharclass.h M regen/mk_invlists.pl M regen/mph.pl M t/op/utftaint.t M t/re/regexp.t M t/re/rt122747.t M t/re/uniprops01.t M t/re/uniprops02.t M t/re/uniprops03.t M t/re/uniprops04.t M t/re/uniprops05.t M t/re/uniprops06.t M t/re/uniprops07.t M t/re/uniprops08.t M t/re/uniprops09.t M t/re/uniprops10.t M t/run/fresh_perl.t M t/test.pl M uni_keywords.h M vms/descrip_mms.template M win32/GNUmakefile M win32/makefile.mk Log Message: ----------- Remove lib/unicore/Heavy.pl This file was for the use of utf8_heavy.pl. But now that that is incorporated into Unicode::UCD, move the definitions from Heavy.pl to lib/unicore/UCD.pl which is used by Unicode::UCD. This allows removing package names. Commit: 240494d6992696a7a350217c131e1d5dc1444a0c https://github.com/Perl/perl5/commit/240494d6992696a7a350217c131e1d5dc1444a0c Author: Karl Williamson <k...@cpan.org> Date: 2019-11-06 (Wed, 06 Nov 2019) Changed paths: M .gitignore M MANIFEST M Porting/Maintainers.pl M Porting/todo.pod M charclass_invlists.h M doop.c M dump.c M ebcdic_tables.h M embed.fnc M embed.h M embedvar.h M handy.h M intrpvar.h M invlist_inline.h M lib/B/Deparse.pm M lib/B/Deparse.t M lib/B/Op_private.pm M lib/Unicode/UCD.pm M lib/Unicode/UCD.t M lib/unicore/mktables M lib/unicore/uni_keywords.pl R lib/utf8_heavy.pl M op.c M op.h M opcode.h M perl.c M pod/perldiag.pod M proto.h M regcharclass.h M regcomp.c M regen/ebcdic.pl M regen/mk_invlists.pl M regen/mph.pl M regen/op_private M sv.c M t/lib/warnings/op M t/op/tr.t M t/op/utftaint.t M t/re/pat.t M t/re/regexp.t M t/re/rt122747.t M t/re/uniprops01.t M t/re/uniprops02.t M t/re/uniprops03.t M t/re/uniprops04.t M t/re/uniprops05.t M t/re/uniprops06.t M t/re/uniprops07.t M t/re/uniprops08.t M t/re/uniprops09.t M t/re/uniprops10.t M t/run/fresh_perl.t M t/test.pl R t/uni/cache.t M toke.c M uni_keywords.h M utf8.c M vms/descrip_mms.template M win32/GNUmakefile M win32/makefile.mk Log Message: ----------- Merge branch 'Remove swashes from core' into blead This branch reimplements the final use of swashes in core, tr///, and then proceeds to remove the swash implementation from core. Swashes are still used in Unicode::UCD, though this can also be changed. But there are higher priority tasks to do at the moment. I started work on this more than two releases ago, and it finally is ready. Compare: https://github.com/Perl/perl5/compare/04863ba12958...240494d69926