Re: [RFC] Tokenizer API

Amos Jeffries Tue, 10 Dec 2013 01:52:54 -0800

On 10/12/2013 9:38 p.m., Robert Collins wrote:
> On 10 December 2013 19:13, Amos Jeffries <[email protected]> wrote:
> 
>> The problem with comparing input strings to a SBuf of characters is that
>> parsing a input of length N againt charset of size M takes O(N*M) time.
> 
> Huh? There are linear time parsers with PEGs. Or maybe I don't
> understand one of your preconditions to come up with an N*M complexity
> here.


The brute-force way is to scan each N position for the M delimiters
individually.
 --> SBuf uses while(0..N) {memchr(M)} ... O(N*M)


PS. I suspect the linear time parsers you know of are converting the PEG
into bool-array first before doing linear time scan with it or using
multiple CPU threads to do a parallel linear scan on small M.

Amos

Re: [RFC] Tokenizer API

Reply via email to