Branch: refs/heads/blead
  Home:   https://github.com/Perl/perl5
  Commit: e8f0fec863d7b588d1fe0d52eabc7822542ed0ff
      
https://github.com/Perl/perl5/commit/e8f0fec863d7b588d1fe0d52eabc7822542ed0ff
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M op.c

  Log Message:
  -----------
  op.c: Indent some code

This is in preparation for a future commit which will surround this with
an 'if'.


  Commit: 2f96a1b45550620873a2dd338a4ad004910ce4f8
      
https://github.com/Perl/perl5/commit/2f96a1b45550620873a2dd338a4ad004910ce4f8
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M doop.c
    M op.c

  Log Message:
  -----------
  doop.c, op.c: White-space only

Remove trailing blanks and outdent a doubly indented block


  Commit: 0358acfc73b23d10ba01096f92cd32b3d85d1a12
      
https://github.com/Perl/perl5/commit/0358acfc73b23d10ba01096f92cd32b3d85d1a12
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M op.c

  Log Message:
  -----------
  op.c: Comments only

Indent for clarity, and add a comment


  Commit: dc8faf6bad9ccc8bd6101ba9aa256e7146845799
      
https://github.com/Perl/perl5/commit/dc8faf6bad9ccc8bd6101ba9aa256e7146845799
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M op.c
    M op.h
    M toke.c

  Log Message:
  -----------
  Change macro name in tr/// code

This makes it more mnemonic.  Also add an explanation in toke.c


  Commit: f534d5462959256391e14ee587e98cbc036c9e4a
      
https://github.com/Perl/perl5/commit/f534d5462959256391e14ee587e98cbc036c9e4a
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M doop.c
    M embed.fnc
    M embed.h
    M proto.h

  Log Message:
  -----------
  doop.c: Add a parameter to a few fcns

instead of deriving it each time from inside the function.  This is in
preparation for future commits.


  Commit: 482bf6150109082d5b3f93aca376c4ca258a597c
      
https://github.com/Perl/perl5/commit/482bf6150109082d5b3f93aca376c4ca258a597c
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M doop.c
    M op.c
    M op.h

  Log Message:
  -----------
  op.c, doop.c Use mnemonics instead of numeric values

For legibility and maintainability


  Commit: b37fc6e8df40e52106f2828c60e7e12bb707fa13
      
https://github.com/Perl/perl5/commit/b37fc6e8df40e52106f2828c60e7e12bb707fa13
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M ebcdic_tables.h
    M regen/ebcdic.pl

  Log Message:
  -----------
  regen/ebcdic.pl: Add tables that partition by UTF-8 length

These will be used in a future commit.  This creates equivalence classes
of ranges of code points whose UTF-8 representations are the same length


  Commit: 1132699c20405cf61bb0094610246144a29ceee7
      
https://github.com/Perl/perl5/commit/1132699c20405cf61bb0094610246144a29ceee7
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M op.c

  Log Message:
  -----------
  op.c: Simplify expression.

This also makes sure 'struct_size' has the correct value in it for any
future uses.


  Commit: b8e8e0fc7ce2b44e6b1ba9f6ea70ac62f044c284
      
https://github.com/Perl/perl5/commit/b8e8e0fc7ce2b44e6b1ba9f6ea70ac62f044c284
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M doop.c

  Log Message:
  -----------
  doop.c: Add, revise comments


  Commit: 472799915f21962871c7c62bd6b48f9e6775e0b7
      
https://github.com/Perl/perl5/commit/472799915f21962871c7c62bd6b48f9e6775e0b7
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M doop.c
    M op.h

  Log Message:
  -----------
  doop.c: Change out-of-bounds value

This currently uses 0xfeedface as a marker for something that isn't a
legal value.  But that could in fact become legal at same point.  This
defines a value TR_OOB that can be guaranteed not to become legal.


  Commit: 5e874c42f3fcb001bc30d9d1a2618581b52a412e
      
https://github.com/Perl/perl5/commit/5e874c42f3fcb001bc30d9d1a2618581b52a412e
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M doop.c

  Log Message:
  -----------
  doop.c: Change name of variable

This helped me understand what was going on in this function


  Commit: 79f0ed310ab9459639c1dfac0c0c276669ad500d
      
https://github.com/Perl/perl5/commit/79f0ed310ab9459639c1dfac0c0c276669ad500d
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M t/op/tr.t

  Log Message:
  -----------
  t/op/tr.t: Add tests, incl. a TODO

This adds a TODO test which demonstrates that the current tr/// is
broken, to be fixed by the next commit.

It adds other tests designed to stress the forthcoming revisions in the
implementation of tr///.


  Commit: 00bd451dcd3aeb82c9155d46e0f39afac4e75a7a
      
https://github.com/Perl/perl5/commit/00bd451dcd3aeb82c9155d46e0f39afac4e75a7a
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M doop.c
    M t/op/tr.t

  Log Message:
  -----------
  doop.c: Refactor do_trans_complex()

I had trouble understanding how this uncommented routine worked.  And it
turned out to be broken, squeezing the pre-transliterated characters
instead of the post-transliterated ones.  This fixes the TODO test added
in the previous commit.


  Commit: 6507ac8375250fed048a612594c6d6d4f5ac2613
      
https://github.com/Perl/perl5/commit/6507ac8375250fed048a612594c6d6d4f5ac2613
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M lib/B/Op_private.pm
    M op.h
    M opcode.h
    M regen/op_private

  Log Message:
  -----------
  Change names of some OPpTRANS flags

These two flags will shortly become obsolete, replaced by ones with
different meanings.  This flag makes the new ones the normal ones, and
makes the old names synonyms so that code that refers to them can
compile.


  Commit: 84ac8fac229faf9c2e1499494772e5cafed92229
      
https://github.com/Perl/perl5/commit/84ac8fac229faf9c2e1499494772e5cafed92229
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M op.h

  Log Message:
  -----------
  op.h: Add synonyms for some tr/// values


  Commit: 58a0d047aa9b5d14eab60e85a550efa918a92018
      
https://github.com/Perl/perl5/commit/58a0d047aa9b5d14eab60e85a550efa918a92018
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M embed.fnc
    M embed.h
    M invlist_inline.h
    M op.c
    M proto.h

  Log Message:
  -----------
  op.c: Add debugging dump function

This function dumps out an inversion map


  Commit: 8c90d3a9c79a9471ef12dde584263fc38571cf46
      
https://github.com/Perl/perl5/commit/8c90d3a9c79a9471ef12dde584263fc38571cf46
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M embedvar.h
    M intrpvar.h
    M perl.c

  Log Message:
  -----------
  intrpvar.h: Add variable for use in tr///

This is part of this branch of changes.


  Commit: f34acfecc286f2eff2450db713da005d888a7317
      
https://github.com/Perl/perl5/commit/f34acfecc286f2eff2450db713da005d888a7317
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M doop.c
    M dump.c
    M embed.fnc
    M embed.h
    M invlist_inline.h
    M lib/B/Deparse.pm
    M op.c
    M op.h
    M proto.h
    M toke.c

  Log Message:
  -----------
  Reimplement tr/// without swashes

This large commit removes the last use of swashes from core.

It replaces swashes by inversion maps.  This data structure is already
in use for some Unicode properties, such as case changing.

The inversion map data structure leads to straight forward
implementation code, so I collapsed the two doop.c routines
do_trans_complex_utf8() and do_trans_simple_utf8() into one.  A few
conditionals could be avoided in the loop if this function were split so
that one version didn't have to test for, e.g., squashing, but I suspect
these are in the noise in the loop, which has to deal with UTF-8
conversions.  This should be faster than the previous implementation
anyway.  I measured the differences some releases back, and inversion
maps were faster than the equivalent swash for up to 512 or 1024
different ranges.  These numbers are unlikely to be exceeded in tr///
except possibly in machine-generated ones.

Inversion maps are capable of handling both UTF-8 and non-UTF-8 cases,
but I left in the existing non-UTF-8 implementation, which uses tables,
because I suspect it is faster.  This means that there is extra code,
purely for runtime performance.

An inversion map is always created from the input, and then if the table
implementation is to be used, the table is easily derived from the map.
Prior to this commit, the table implementation was used in certain edge
cases involving code points above 255.  Those cases are now handled by
the inversion map implementation, because it would have taken extra code
to detect them, and I didn't think it was worth it.  That could be
changed if I am wrong.

Creating an inversion map for all inputs essentially normalizes them,
and then the same logic is usable for all.  This fixes some false
negatives in the previous implementation.  It also allows for detecting
if the actual transliteration can be done in place.  Previously, the
code mostly punted on that detection for the UTF-8 case.

This also allows for accurate counting of the lengths of the two sides,
fixing some longstanding TODO warning tests.

A new flag is created, OPpTRANS_CAN_FORCE_UTF8, when the tr/// has a
below 256 character resolving to one that requires UTF-8.  If this isn't
set, the code knows that a non-UTF-8 input won't become UTF-8 in the
process, and so can take short cuts.  The bit representing this flag is
the same as OPpTRANS_FROM_UTF, which is no longer used.  That name is
left in so that the dozen-ish modules in cpan that refer to it can still
compile.  AFAICT none of them actually use the flag, as well they
shouldn't since it is private to the core.

Inversion maps are ideally suited for tr/// implementations.  An issue
with them in general is that for some pathological data, they can become
fragmented requiring more space than you would expect, to represent the
underlying data.  However, the typical tr/// would not have this issue,
requiring only very short inversion maps to represent; in some cases
shorter than the table implementation.

Inversion maps are also easier to deparse than swashes.  A deparse TODO
was also fixed by this commit, and the code to deparse UTF-8 inputs is
simplified.

One could implement specialized data structures for specific types of
inputs.  For example, a common tr/// form is a single range, like
tr/A-Z/a-z/.  That could be implemented without a table and be quite
fast.  An intermediate step would be to use the inversion map
implementation always when the transliteration is a single range, and
then special case length=1 maps at execution time.

Thanks to Nicholas Rochemagne for his help on B


  Commit: 3c4ee8bc4711cf6c1650c141dc86a9931d5634d9
      
https://github.com/Perl/perl5/commit/3c4ee8bc4711cf6c1650c141dc86a9931d5634d9
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M lib/B/Deparse.t
    M t/lib/warnings/op

  Log Message:
  -----------
  UnTODO some tests fixed by the previous commit


  Commit: bf42a69327483603a53945ca159333221e0368b2
      
https://github.com/Perl/perl5/commit/bf42a69327483603a53945ca159333221e0368b2
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M Porting/todo.pod

  Log Message:
  -----------
  Porting/todo.pod: Rmv reference to fixing swashes


  Commit: 2366ba44ab34fe60aa193eeae1be4330368dcb49
      
https://github.com/Perl/perl5/commit/2366ba44ab34fe60aa193eeae1be4330368dcb49
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M handy.h

  Log Message:
  -----------
  handy.h: Change references to swashes

As these are no longer used.


  Commit: bf6e464fb44457ed714ead27ebf49a5a627a99ae
      
https://github.com/Perl/perl5/commit/bf6e464fb44457ed714ead27ebf49a5a627a99ae
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M op.c

  Log Message:
  -----------
  op.c: Remove no-longer used function


  Commit: 58aa673865ea7e8aaee4b242ff4885eac1ee1334
      
https://github.com/Perl/perl5/commit/58aa673865ea7e8aaee4b242ff4885eac1ee1334
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M MANIFEST
    M charclass_invlists.h
    M doop.c
    M embed.fnc
    M embed.h
    M embedvar.h
    M intrpvar.h
    M lib/unicore/mktables
    M lib/unicore/uni_keywords.pl
    M lib/utf8_heavy.pl
    M pod/perldiag.pod
    M proto.h
    M regcharclass.h
    M sv.c
    R t/uni/cache.t
    M toke.c
    M uni_keywords.h
    M utf8.c

  Log Message:
  -----------
  Remove swashes from core

Also references to the term.


  Commit: 483a80b4eb1ce75c33945f69455138be14944460
      
https://github.com/Perl/perl5/commit/483a80b4eb1ce75c33945f69455138be14944460
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M MANIFEST
    M Porting/Maintainers.pl
    M charclass_invlists.h
    M lib/Unicode/UCD.pm
    M lib/Unicode/UCD.t
    M lib/unicore/mktables
    M lib/unicore/uni_keywords.pl
    R lib/utf8_heavy.pl
    M regcharclass.h
    M regcomp.c
    M t/re/pat.t
    M uni_keywords.h

  Log Message:
  -----------
  Remove utf8_heavy.pl

The only remaining user of this is Unicode::UCD, and so most of the code
from utf8_heavy.pl is moved into that UCD.pm.

It removes a no-longer relevant test (that had been changed into a skip
anyway), and it changes or removes the no-longer relevant references in
comments to utf8_heavy.pl

Later commits will do some simplification as not all the previous
functionality is needed.  This commit removed only the parts that were
preventing compilation and tests passing.


  Commit: 29791e7dd4d3e5de7d243a10a71109b4ef20189d
      
https://github.com/Perl/perl5/commit/29791e7dd4d3e5de7d243a10a71109b4ef20189d
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M lib/Unicode/UCD.pm

  Log Message:
  -----------
  UCD.pm: Remove 'none' from swash

This was only used by tr///, and hence is no longer relevant.  I never
really understood it.


  Commit: 048bdb720dd091aa62a709b37e2e074164fd7cdc
      
https://github.com/Perl/perl5/commit/048bdb720dd091aa62a709b37e2e074164fd7cdc
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M .gitignore
    M charclass_invlists.h
    M lib/Unicode/UCD.pm
    M lib/Unicode/UCD.t
    M lib/unicore/mktables
    M lib/unicore/uni_keywords.pl
    M regcharclass.h
    M regen/mk_invlists.pl
    M regen/mph.pl
    M t/op/utftaint.t
    M t/re/regexp.t
    M t/re/rt122747.t
    M t/re/uniprops01.t
    M t/re/uniprops02.t
    M t/re/uniprops03.t
    M t/re/uniprops04.t
    M t/re/uniprops05.t
    M t/re/uniprops06.t
    M t/re/uniprops07.t
    M t/re/uniprops08.t
    M t/re/uniprops09.t
    M t/re/uniprops10.t
    M t/run/fresh_perl.t
    M t/test.pl
    M uni_keywords.h
    M vms/descrip_mms.template
    M win32/GNUmakefile
    M win32/makefile.mk

  Log Message:
  -----------
  Remove lib/unicore/Heavy.pl

This file was for the use of utf8_heavy.pl.  But now that that is
incorporated into Unicode::UCD, move the definitions from Heavy.pl to
lib/unicore/UCD.pl which is used by Unicode::UCD.  This allows removing
package names.


  Commit: 240494d6992696a7a350217c131e1d5dc1444a0c
      
https://github.com/Perl/perl5/commit/240494d6992696a7a350217c131e1d5dc1444a0c
  Author: Karl Williamson <k...@cpan.org>
  Date:   2019-11-06 (Wed, 06 Nov 2019)

  Changed paths:
    M .gitignore
    M MANIFEST
    M Porting/Maintainers.pl
    M Porting/todo.pod
    M charclass_invlists.h
    M doop.c
    M dump.c
    M ebcdic_tables.h
    M embed.fnc
    M embed.h
    M embedvar.h
    M handy.h
    M intrpvar.h
    M invlist_inline.h
    M lib/B/Deparse.pm
    M lib/B/Deparse.t
    M lib/B/Op_private.pm
    M lib/Unicode/UCD.pm
    M lib/Unicode/UCD.t
    M lib/unicore/mktables
    M lib/unicore/uni_keywords.pl
    R lib/utf8_heavy.pl
    M op.c
    M op.h
    M opcode.h
    M perl.c
    M pod/perldiag.pod
    M proto.h
    M regcharclass.h
    M regcomp.c
    M regen/ebcdic.pl
    M regen/mk_invlists.pl
    M regen/mph.pl
    M regen/op_private
    M sv.c
    M t/lib/warnings/op
    M t/op/tr.t
    M t/op/utftaint.t
    M t/re/pat.t
    M t/re/regexp.t
    M t/re/rt122747.t
    M t/re/uniprops01.t
    M t/re/uniprops02.t
    M t/re/uniprops03.t
    M t/re/uniprops04.t
    M t/re/uniprops05.t
    M t/re/uniprops06.t
    M t/re/uniprops07.t
    M t/re/uniprops08.t
    M t/re/uniprops09.t
    M t/re/uniprops10.t
    M t/run/fresh_perl.t
    M t/test.pl
    R t/uni/cache.t
    M toke.c
    M uni_keywords.h
    M utf8.c
    M vms/descrip_mms.template
    M win32/GNUmakefile
    M win32/makefile.mk

  Log Message:
  -----------
  Merge branch 'Remove swashes from core' into blead

This branch reimplements the final use of swashes in core, tr///, and
then proceeds to remove the swash implementation from core.

Swashes are still used in Unicode::UCD, though this can also be changed.
But there are higher priority tasks to do at the moment.

I started work on this more than two releases ago, and it finally is
ready.


Compare: https://github.com/Perl/perl5/compare/04863ba12958...240494d69926

Reply via email to