ID: 50894 User updated by: lee at projectmastermind dot com Reported By: lee at projectmastermind dot com Status: Open Bug Type: Performance problem Operating System: Linux, OSX -PHP Version: 6SVN-2010-02-01 (snap) +PHP Version: 5.*,6 New Comment:
updated version to indicate that the problem exists in versions 5.x and 6. Previous Comments: ------------------------------------------------------------------------ [2010-02-01 06:13:33] lee at projectmastermind dot com Description: ------------ given a value with a particular type, casting it to that same type should essentially be a no-op -- once it is determined that the operand already has the correct type, no further action needs to be taken. Ex: $a = array(); $b = (array)$a; In this example, $a is already an array, so this should be a simple assignment operation. $b should get a "lazy" copy of $a via PHP's copy-on-write policy. Instead, the cast operation seems to force an immediate (non-lazy) full copy. This creates a huge potential for hidden performance problems, as it causes code that *looks* like it would run in constant time [O(1)] to actually require linear time [O(n)] (where n represents the size of the data being copied). I have verified that this issue does exist for string types as well. I assume that it applies to all PHP types. Of course it becomes a significant performance issue primarily for types that can hold large amounts of data, where the data is duplicated whenever the zval is duplicated (AFAIK, this is only string and array). I have verified this on the following versions of php: 5.2.6 5.2.8 6.0.0-dev (php6.0-201001312130) Reproduce code: --------------- <?php for( $z=1; $z<5; ++$z ) { $a = array_fill(0, 100*$z, '0'); $t_start = microtime(true); for($i=0;$i<100000;++$i) { // O(n) [should be constant time, but isn't] // cast triggers non-lazy copy // $b = (array)$a; // O(1) [constant time, as expected] // (comment above, and uncomment here for comparison) // //$b = $a; } $t_elapsed = (microtime(true)*1000)-($t_start*1000); printf( "(%d elements * %d copies): %f ms\n\n", 100*$z, $i, $t_elapsed ); } Expected result: ---------------- (100 elements * 100000 loops): 11.264160 ms (200 elements * 100000 loops): 11.363037 ms (300 elements * 100000 loops): 11.208984 ms (400 elements * 100000 loops): 11.809082 ms NOTE: the time stays roughly constant as the number of elements increases -- the assignments are copy-on-write, so no significant performance hit is incurred. Actual result: -------------- (100 elements * 100000 copies): 736.453613 ms (200 elements * 100000 copies): 1448.991211 ms (300 elements * 100000 copies): 2130.541016 ms (400 elements * 100000 copies): 2823.362793 ms NOTE: the time increases as the size of the array increases. (This happens with large strings too). This is a good indicator that a copy is being made [non-lazily] when the cast is applied. ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=50894&edit=1