Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-29 Thread Patrick Schluter via Digitalmars-d-learn
On Saturday, 28 January 2017 at 15:40:24 UTC, Nestor wrote: On Friday, 27 January 2017 at 04:26:31 UTC, Era Scarecrow wrote: Skipping the BOM is just a matter of skipping the first two bytes identifying it... AFAIK in some cases the BOM takes up to 4 bytes (FOR UTF-32), so when input encodin

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-28 Thread Nestor via Digitalmars-d-learn
On Friday, 27 January 2017 at 04:26:31 UTC, Era Scarecrow wrote: Skipping the BOM is just a matter of skipping the first two bytes identifying it... AFAIK in some cases the BOM takes up to 4 bytes (FOR UTF-32), so when input encoding is unknown one must perform some kind of detection in orde

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-27 Thread Era Scarecrow via Digitalmars-d-learn
On Friday, 27 January 2017 at 07:02:52 UTC, Jack Applegame wrote: On Monday, 16 January 2017 at 14:47:23 UTC, Era Scarecrow wrote: static char[1024*4] buffer; //4k reusable buffer, NOT thread safe Maybe I'm wrong, but I think it's thread safe. Because static mutable non-shared variables

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-26 Thread Jack Applegame via Digitalmars-d-learn
On Monday, 16 January 2017 at 14:47:23 UTC, Era Scarecrow wrote: static char[1024*4] buffer; //4k reusable buffer, NOT thread safe Maybe I'm wrong, but I think it's thread safe. Because static mutable non-shared variables are stored in TLS.

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-26 Thread Era Scarecrow via Digitalmars-d-learn
On Tuesday, 17 January 2017 at 11:40:15 UTC, Nestor wrote: Thanks, but unfortunately this function does not produce proper UTF8 strings, as a matter of fact the output even starts with the BOM. Also it doesn't handle CRLF, and even for LF terminated lines it doesn't seem to work for lines other

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-17 Thread Nestor via Digitalmars-d-learn
On Monday, 16 January 2017 at 14:47:23 UTC, Era Scarecrow wrote: On Sunday, 15 January 2017 at 19:48:04 UTC, Nestor wrote: I see. So correcting my original doubt: How could I parse an UTF16LE file line by line (producing a proper string in each iteration) without loading the entire file into

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-16 Thread Era Scarecrow via Digitalmars-d-learn
On Sunday, 15 January 2017 at 19:48:04 UTC, Nestor wrote: I see. So correcting my original doubt: How could I parse an UTF16LE file line by line (producing a proper string in each iteration) without loading the entire file into memory? Could... roll your own? Although if you wanted it to be

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-15 Thread Nestor via Digitalmars-d-learn
On Sunday, 15 January 2017 at 16:29:23 UTC, Daniel Kozák wrote: This is because byLine does return range, so until you do something with that it does not cause any harm :) I see. So correcting my original doubt: How could I parse an UTF16LE file line by line (producing a proper string in each

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-15 Thread Daniel Kozák via Digitalmars-d-learn
V Sun, 15 Jan 2017 14:48:12 + Nestor via Digitalmars-d-learn napsáno: > On Friday, 6 January 2017 at 11:42:17 UTC, Mike Wey wrote: > > On 01/06/2017 11:33 AM, pineapple wrote: > >> On Friday, 6 January 2017 at 06:24:12 UTC, rumbu wrote: > > I'm not sure if this works quite as in

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-15 Thread Nestor via Digitalmars-d-learn
On Sunday, 15 January 2017 at 14:48:12 UTC, Nestor wrote: After some testing I realized that byLine was not the one failing, but any string manipulation done to the obtained line. Compile the following example with and without -debug and run to see what I mean: import std.stdio, std.string;

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-15 Thread Nestor via Digitalmars-d-learn
On Friday, 6 January 2017 at 11:42:17 UTC, Mike Wey wrote: On 01/06/2017 11:33 AM, pineapple wrote: On Friday, 6 January 2017 at 06:24:12 UTC, rumbu wrote: I'm not sure if this works quite as intended, but I was at least able to produce a UTF-16 decode error rather than a UTF-8 decode error

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-06 Thread Mike Wey via Digitalmars-d-learn
On 01/06/2017 11:33 AM, pineapple wrote: On Friday, 6 January 2017 at 06:24:12 UTC, rumbu wrote: I'm not sure if this works quite as intended, but I was at least able to produce a UTF-16 decode error rather than a UTF-8 decode error by setting the file orientation before reading it. import

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-06 Thread pineapple via Digitalmars-d-learn
On Friday, 6 January 2017 at 06:24:12 UTC, rumbu wrote: I'm not sure if this works quite as intended, but I was at least able to produce a UTF-16 decode error rather than a UTF-8 decode error by setting the file orientation before reading it. import std.stdio; import core.stdc.wchar

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-05 Thread rumbu via Digitalmars-d-learn
I'm not sure if this works quite as intended, but I was at least able to produce a UTF-16 decode error rather than a UTF-8 decode error by setting the file orientation before reading it. import std.stdio; import core.stdc.wchar_ : fwide; void main(){ auto file = File("UTF-

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-04 Thread Daniel Kozák via Digitalmars-d-learn
Nestor via Digitalmars-d-learn napsal St, led 4, 2017 v 8∶20 : On Wednesday, 4 January 2017 at 18:48:59 UTC, Daniel Kozák wrote: Ok, I've done some testing and you are right byLine is broken, so please fill a bug A bug? I was under the impression that this function was *intended* to work onl

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-04 Thread pineapple via Digitalmars-d-learn
On Wednesday, 4 January 2017 at 19:20:31 UTC, Nestor wrote: On Wednesday, 4 January 2017 at 18:48:59 UTC, Daniel Kozák wrote: Ok, I've done some testing and you are right byLine is broken, so please fill a bug A bug? I was under the impression that this function was *intended* to work only wi

Re: Parsing a UTF-16LE file line by line, BUG?

2017-01-04 Thread Nestor via Digitalmars-d-learn
On Wednesday, 4 January 2017 at 18:48:59 UTC, Daniel Kozák wrote: Ok, I've done some testing and you are right byLine is broken, so please fill a bug A bug? I was under the impression that this function was *intended* to work only with UTF-8 encoded files.