syl-nktaylor added the comment:
The build did seem to run, despite memset using fillchar without the explicit
casting, so I assumed it did an implicit casting, but the original casting can
be kept of course. With this build, my sample tests for 1-byte, 2-byte and
4-byte chars also ran ok, so
STINNER Victor added the comment:
+Py_UCS4 fill_char = PyUnicode_READ(char_size, PyUnicode_DATA(str), 0);
+memset(to, fill_char, len);
The second parameter of memset() is a byte (8-bit "octet"). You cannot pass
Py_UCS4 to memset(), it doesn't work.
--
___
Serhiy Storchaka added the comment:
BTW, CPython does not use UTF-8 and UTF-16 encoding in internal representation
of strings. It uses Latin1, UCS2 and UCS4 (UTF-32).
What benchmarks show? Is your code always faster and how much? If it is slower
for some data, for what data and how much?
--
New submission from syl-nktaylor :
In
https://github.com/python/cpython/blob/master/Objects/unicodeobject.c#L12930,
unicode_repeat does string multiplication with an integer in 3 different ways:
1) one memset call, for utf-8 when string size is 1
2) linear 'for' loops, for utf-16 and utf-32 wh