You forgot
mb_internal_encoding("UTF-8");
without that, mb_substr is just an alias for substr
my results look like:
normal iteration took 0.64724087715149
mb_substr method took 16.471849918365
mb_substr method with shortening the string took 21.613878965378
preg_split method took 1.927277803421
Dan is the winner. preg_split always runs in linear time. Both of
the mb_substr are O(N^2), because the first step in mb_substr is
splitting the string into array. It is not as intelligent as I
initially assumed.
Regards,
John Campbell
On Wed, Jan 13, 2010 at 11:37 AM, Rob Marscher
<[email protected]> wrote:
> OK. Here are the results of my rough benchmark. Every time I ran it, the
> results were within about .025 seconds of each other so it seems accurate.
> Surprisingly, my original mb_substr method won, with preg_split taking just a
> little bit longer. John's method of grabbing the first character and then
> removing it from the string actually seems take almost exponentially more
> time based on how long the string is. I set $strSize to 1000 and had to kill
> it because I didn't want to wait so long. There must be something pretty
> inefficient going on in mb_substr to make that the case. I suppose we could
> look at the source to get to the bottom of it... but I think I've already
> spent as much time on this as I'm willing to. Thanks again to you guys.
>
> $ php mbtest.php
> normal iteration took 0.8041729927063
> mb_substr method took 1.7228858470917
> mb_substr method with shortening the string took 7.9840841293335
> preg_split method took 2.1547298431396
>
> $ cat mbtest.php
> <?php
>
> $strSize = 100;
> $repeats = 1000;
>
> // make the string somewhat large
> $str = '';
> for ($i = 0; $i < $strSize; $i++) {
> $str .= "string with utf-8 chars\n åèö";
> }
>
> // non-multibyte iteration
> $start = microtime(true);
> for ($i = 0; $i < $repeats; $i++) {
> $length = strlen($str);
> $newStr = '';
> for ($j = 0; $j < $length; $j++) {
> $newStr .= $str{$j};
> }
> }
> $end = microtime(true);
> echo "normal iteration took " . ($end - $start) . "\n";
>
> // mb_substr method
> $start = microtime(true);
> for ($i = 0; $i < $repeats; $i++) {
> $length = mb_strlen($str);
> $newStr = '';
> $rest = $str;
> for ($j = 0; $j < $length; $j++) {
> $newStr .= mb_substr($rest, $j, 1);
> }
> }
> $end = microtime(true);
> echo "mb_substr method took " . ($end - $start) . "\n";
>
> // mb_substr method, shortening string
> $start = microtime(true);
> for ($i = 0; $i < $repeats; $i++) {
> $length = mb_strlen($str);
> $newStr = '';
> $rest = $str;
> while ($rest) {
> $newStr .= mb_substr($rest, 0, 1);
> $rest = mb_substr($rest, 1);
> }
> }
> $end = microtime(true);
> echo "mb_substr method with shortening the string took " . ($end - $start) .
> "\n";
>
> // preg_split method
> $start = microtime(true);
> for ($i = 0; $i < $repeats; $i++) {
> $chars = preg_split('//u', $str, -1, PREG_SPLIT_NO_EMPTY);
> $length = count($chars);
> $newStr = '';
> for ($j = 0; $j < $length; $j++) {
> $newStr += $chars[$j];
> }
> }
> $end = microtime(true);
> echo "preg_split method took " . ($end - $start) . "\n";
>
>
>
> _______________________________________________
> New York PHP Users Group Community Talk Mailing List
> http://lists.nyphp.org/mailman/listinfo/talk
>
> http://www.nyphp.org/Show-Participation
>
_______________________________________________
New York PHP Users Group Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk
http://www.nyphp.org/Show-Participation