ID:               49874
 Comment by:       jketterl at chipxonio dot de
 Reported By:      jketterl at chipxonio dot de
 Status:           Feedback
 Bug Type:         Filesystem function related
 Operating System: linux (ubuntu)
 PHP Version:      5.2.11
 New Comment:

thanks for having a look

i tried with and without. the challenge is to get it working without,
because that's the worst case my app has to deal with, but the BOM
doesn't seem to solve this.

$ hexdump test-with-bom.csv
0000000 feff 004c 0069 006e 0065 0020 0030 0031
0000010 000a 004c 0069 006e 0065 0020 0030 0032
0000020 000a 004c 0069 006e 0065 0020 0030 0033
0000030 000a 004c 0069 006e 0065 0020 0030 0034
0000040 000a
0000042

$ php test.php
string(8) "Line 01
"
string(8) "Line 02
"
string(8) "Line 01
"
string(5) "e 01
"

i also tried opening the file including the BOM without a stream
filter, but that just resulted in php reading in two extra chars (the
BOM converted in some way i guess) on the beginning of the first line.

i thought i'd attach the sample files to this bug, but it seems like i
can't. i've uploaded them here instead:
http://www.djmacgyver.net/tmp/php-ftell/


Previous Comments:
------------------------------------------------------------------------

[2009-10-14 16:40:00] sjo...@php.net

Thank you for your bug report. Does your test.csv file start with a
BOM? You can determine this by viewing the file in a hex editor. If it
starts with fffe or feff, it has a BOM (byte order mark).

------------------------------------------------------------------------

[2009-10-14 11:39:39] jketterl at chipxonio dot de

Description:
------------
exact php version: PHP 5.2.11-0.dotdeb.1 with Suhosin-Patch 0.9.7 (cli)
(built: Sep 20 2009 09:41:43)
this bug is also be filter-/stream-related. i just believe it might be
easier to fix on the filesystem side, that's why i chose that category.

when using a php stream filter to convert input from utf-16 into
iso8859 (or most probably from any 2byte-encoded charset into any
single-byte-encode charset) the ftell() and fseek() functions start to
behave inconsistently.

more precisely: fseek() jumps to exact offsets ignoring the
2byte-encoding, whereas ftell() seems to return the number of bytes read
*after* the filter has been applied. thus it is not possible to fseek()
back to a certain offset that has been stored with ftell() before.

the content of the testfile used in the code examples is as follows:
Line 01
Line 02
Line 03
Line 04

Reproduce code:
---------------
$file = 'test.csv';

$fp = fopen($file, 'r');
stream_filter_append($fp, 'convert.iconv.utf16/iso8859-15');
$line = fgets($fp);
var_dump($line);
$line = fgets($fp);
var_dump($line);
fclose($fp);

$fp = fopen($file, 'r');
stream_filter_append($fp, 'convert.iconv.utf16/iso8859-15');
$line = fgets($fp);
var_dump($line);
fseek($fp, ftell($fp)); // this shouldn't move anything - but it
does...
$line = fgets($fp);
var_dump($line);
fclose($fp);

Expected result:
----------------
string(8) "Line 01
"
string(8) "Line 02
"
string(8) "Line 01
"
string(8) "Line 02
"

Actual result:
--------------
string(8) "Line 01
"
string(8) "Line 02
"
string(8) "Line 01
"
string(4) " 01
"


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=49874&edit=1

Reply via email to