ID: 49874 Comment by: jketterl at chipxonio dot de Reported By: jketterl at chipxonio dot de Status: Feedback Bug Type: Filesystem function related Operating System: linux (ubuntu) PHP Version: 5.2.11 New Comment:
thanks for having a look i tried with and without. the challenge is to get it working without, because that's the worst case my app has to deal with, but the BOM doesn't seem to solve this. $ hexdump test-with-bom.csv 0000000 feff 004c 0069 006e 0065 0020 0030 0031 0000010 000a 004c 0069 006e 0065 0020 0030 0032 0000020 000a 004c 0069 006e 0065 0020 0030 0033 0000030 000a 004c 0069 006e 0065 0020 0030 0034 0000040 000a 0000042 $ php test.php string(8) "Line 01 " string(8) "Line 02 " string(8) "Line 01 " string(5) "e 01 " i also tried opening the file including the BOM without a stream filter, but that just resulted in php reading in two extra chars (the BOM converted in some way i guess) on the beginning of the first line. i thought i'd attach the sample files to this bug, but it seems like i can't. i've uploaded them here instead: http://www.djmacgyver.net/tmp/php-ftell/ Previous Comments: ------------------------------------------------------------------------ [2009-10-14 16:40:00] sjo...@php.net Thank you for your bug report. Does your test.csv file start with a BOM? You can determine this by viewing the file in a hex editor. If it starts with fffe or feff, it has a BOM (byte order mark). ------------------------------------------------------------------------ [2009-10-14 11:39:39] jketterl at chipxonio dot de Description: ------------ exact php version: PHP 5.2.11-0.dotdeb.1 with Suhosin-Patch 0.9.7 (cli) (built: Sep 20 2009 09:41:43) this bug is also be filter-/stream-related. i just believe it might be easier to fix on the filesystem side, that's why i chose that category. when using a php stream filter to convert input from utf-16 into iso8859 (or most probably from any 2byte-encoded charset into any single-byte-encode charset) the ftell() and fseek() functions start to behave inconsistently. more precisely: fseek() jumps to exact offsets ignoring the 2byte-encoding, whereas ftell() seems to return the number of bytes read *after* the filter has been applied. thus it is not possible to fseek() back to a certain offset that has been stored with ftell() before. the content of the testfile used in the code examples is as follows: Line 01 Line 02 Line 03 Line 04 Reproduce code: --------------- $file = 'test.csv'; $fp = fopen($file, 'r'); stream_filter_append($fp, 'convert.iconv.utf16/iso8859-15'); $line = fgets($fp); var_dump($line); $line = fgets($fp); var_dump($line); fclose($fp); $fp = fopen($file, 'r'); stream_filter_append($fp, 'convert.iconv.utf16/iso8859-15'); $line = fgets($fp); var_dump($line); fseek($fp, ftell($fp)); // this shouldn't move anything - but it does... $line = fgets($fp); var_dump($line); fclose($fp); Expected result: ---------------- string(8) "Line 01 " string(8) "Line 02 " string(8) "Line 01 " string(8) "Line 02 " Actual result: -------------- string(8) "Line 01 " string(8) "Line 02 " string(8) "Line 01 " string(4) " 01 " ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=49874&edit=1