Re: benchmarking Flex practices

2020-01-13 Thread John Naylor
On Tue, Jan 14, 2020 at 4:12 AM Tom Lane wrote: > > John Naylor writes: > > [ v11 patch ] > > I pushed this with some small cosmetic adjustments. Thanks for your help hacking on the token filter. -- John Naylorhttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support,

Re: benchmarking Flex practices

2020-01-13 Thread Tom Lane
John Naylor writes: > [ v11 patch ] I pushed this with some small cosmetic adjustments. One non-cosmetic adjustment I experimented with was to change str_udeescape() to overwrite the source string in-place, since we know that's modifiable storage and de-escaping can't make the string longer. I

Re: benchmarking Flex practices

2020-01-13 Thread John Naylor
On Mon, Jan 13, 2020 at 7:57 AM Tom Lane wrote: > > Hmm ... after a bit of research I agree that these functions are not > a portability hazard. They are present at least as far back as flex > 2.5.33 which is as old as we've got in the buildfarm. > > However, I'm less excited about them from a

Re: benchmarking Flex practices

2020-01-12 Thread Tom Lane
John Naylor writes: >> I no longer use state variables to track scanner state, and in fact I >> removed the existing "state_before" variable in ECPG. Instead, I used >> the Flex builtins yy_push_state(), yy_pop_state(), and yy_top_state(). >> These have been a feature for a long time, it seems,

Re: benchmarking Flex practices

2020-01-02 Thread John Naylor
I wrote: > I no longer use state variables to track scanner state, and in fact I > removed the existing "state_before" variable in ECPG. Instead, I used > the Flex builtins yy_push_state(), yy_pop_state(), and yy_top_state(). > These have been a feature for a long time, it seems, so I think we're

Re: benchmarking Flex practices

2019-12-03 Thread John Naylor
On Tue, Nov 26, 2019 at 10:32 PM Tom Lane wrote: > I haven't looked closely at what ecpg does with the processed > identifiers. If it just spits them out as-is, a possible solution > is to not do anything about de-escaping, but pass the sequence > U&"..." (plus UESCAPE ... if any), just like

Re: benchmarking Flex practices

2019-11-26 Thread Tom Lane
John Naylor writes: > It seems something is not quite right in v9 with the error position reporting: > SELECT U&'wrong: +0061' UESCAPE '+'; > ERROR: invalid Unicode escape character at or near "'+'" > LINE 1: SELECT U&'wrong: +0061' UESCAPE '+'; > -^ >

Re: benchmarking Flex practices

2019-11-26 Thread John Naylor
On Tue, Nov 26, 2019 at 5:51 AM Tom Lane wrote: > > [ My apologies for being so slow to get back to this ] No worries -- it's a nice-to-have, not something our users are excited about. > It struck me though that there's another solution we haven't discussed, > and that's to make the token

Re: benchmarking Flex practices

2019-11-25 Thread Tom Lane
[ My apologies for being so slow to get back to this ] John Naylor writes: > Now that I think of it, the regression in v7 was largely due to the > fact that the parser has to call the lexer 3 times per string in this > case, and that's going to be slower no matter what we do. Ah, of course.

Re: benchmarking Flex practices

2019-09-25 Thread Tom Lane
Alvaro Herrera writes: > ... it seems this patch needs attention, but I'm not sure from whom. > The tests don't pass whenever the server encoding is not UTF8, so I > suppose we should either have an alternate expected output file to > account for that, or the tests should be removed. But anyway

Re: benchmarking Flex practices

2019-09-25 Thread Alvaro Herrera
... it seems this patch needs attention, but I'm not sure from whom. The tests don't pass whenever the server encoding is not UTF8, so I suppose we should either have an alternate expected output file to account for that, or the tests should be removed. But anyway the code needs to be reviewed.

Re: benchmarking Flex practices

2019-08-01 Thread Thomas Munro
On Thu, Aug 1, 2019 at 8:51 PM John Naylor wrote: > select U&'\de04\d83d'; -- surrogates in wrong order > -psql:test_unicode.sql:10: ERROR: invalid Unicode surrogate pair at > or near "U&'\de04\d83d'" > +psql:test_unicode.sql:10: ERROR: invalid Unicode surrogate pair > LINE 1: select

Re: benchmarking Flex practices

2019-08-01 Thread John Naylor
On Mon, Jul 29, 2019 at 10:40 PM Tom Lane wrote: > > John Naylor writes: > > > The lexer returns UCONST from xus and UIDENT from xui. The grammar has > > rules that are effectively: > > > SCONST { do nothing} > > | UCONST { esc char is backslash } > > | UCONST UESCAPE SCONST { esc char is from

Re: benchmarking Flex practices

2019-07-29 Thread Tom Lane
John Naylor writes: > On Sun, Jul 21, 2019 at 3:14 AM Tom Lane wrote: >> So I'm feeling like maybe we should experiment to see what that >> solution looks like, before we commit to going in this direction. >> What do you think? > Given the above wrinkles, I thought it was worth trying. Attached

Re: benchmarking Flex practices

2019-07-24 Thread Tom Lane
Chapman Flack writes: > On 07/24/19 03:45, John Naylor wrote: >> On Sun, Jul 21, 2019 at 3:14 AM Tom Lane wrote: >>> However, my second reaction was that maybe you were on to something >>> upthread when you speculated about postponing de-escaping of >>> Unicode literals into the grammar. If we

Re: benchmarking Flex practices

2019-07-24 Thread Chapman Flack
On 07/24/19 03:45, John Naylor wrote: > On Sun, Jul 21, 2019 at 3:14 AM Tom Lane wrote: >> However, my second reaction was that maybe you were on to something >> upthread when you speculated about postponing de-escaping of >> Unicode literals into the grammar. If we did it like that then Wow,

Re: benchmarking Flex practices

2019-07-24 Thread John Naylor
On Sun, Jul 21, 2019 at 3:14 AM Tom Lane wrote: > > John Naylor writes: > > The pre-existing ecpg var "state_before" was a bit confusing when > > combined with the new var "state_before_quote_stop", and the former is > > also used with C-comments, so I decided to go with > >

Re: benchmarking Flex practices

2019-07-20 Thread Tom Lane
John Naylor writes: > The pre-existing ecpg var "state_before" was a bit confusing when > combined with the new var "state_before_quote_stop", and the former is > also used with C-comments, so I decided to go with > "state_before_lit_start" and "state_before_lit_stop". Even though > comments

Re: benchmarking Flex practices

2019-07-12 Thread John Naylor
On Wed, Jul 10, 2019 at 3:15 AM Tom Lane wrote: > > John Naylor writes: > > [ v4 patches for trimming lexer table size ] > > I reviewed this and it looks pretty solid. One gripe I have is > that I think it's best to limit backup-prevention tokens such as > quotecontinuefail so that they match

Re: benchmarking Flex practices

2019-07-09 Thread Tom Lane
John Naylor writes: > [ v4 patches for trimming lexer table size ] I reviewed this and it looks pretty solid. One gripe I have is that I think it's best to limit backup-prevention tokens such as quotecontinuefail so that they match only exact prefixes of their "success" tokens. This seems

Re: benchmarking Flex practices

2019-07-05 Thread John Naylor
On Wed, Jul 3, 2019 at 5:35 AM Tom Lane wrote: > > As far as I can see, the point of 0002 is to have just one set of > flex rules for the various variants of quotecontinue processing. > That sounds OK, though I'm a bit surprised it makes this much difference > in the table size. I would suggest

Re: benchmarking Flex practices

2019-07-03 Thread John Naylor
On Wed, Jul 3, 2019 at 5:35 AM Tom Lane wrote: > > John Naylor writes: > > 0001 is a small patch to remove some unneeded generality from the > > current rules. This lowers the number of elements in the yy_transition > > array from 37045 to 36201. > > I don't particularly like 0001. The two bits

Re: benchmarking Flex practices

2019-07-02 Thread Tom Lane
John Naylor writes: > 0001 is a small patch to remove some unneeded generality from the > current rules. This lowers the number of elements in the yy_transition > array from 37045 to 36201. I don't particularly like 0001. The two bits like this -whitespace ({space}+|{comment})

Re: benchmarking Flex practices

2019-06-27 Thread John Naylor
I wrote: > > I found a possible other way to bring the size of the transition table > > under 32k entries while keeping the existing no-backup rules in place: > > Replace the "quotecontinue" rule with a new state. In the attached > > draft patch, when Flex encounters a quote while inside any kind

Re: benchmarking Flex practices

2019-06-24 Thread John Naylor
I wrote: > > I'll look for other rules that could be more > > easily optimized, but I'm not terribly optimistic. > > I found a possible other way to bring the size of the transition table > under 32k entries while keeping the existing no-backup rules in place: > Replace the "quotecontinue" rule

Re: benchmarking Flex practices

2019-06-24 Thread John Naylor
I wrote: > I'll look for other rules that could be more > easily optimized, but I'm not terribly optimistic. I found a possible other way to bring the size of the transition table under 32k entries while keeping the existing no-backup rules in place: Replace the "quotecontinue" rule with a new

Re: benchmarking Flex practices

2019-06-20 Thread Andres Freund
Hi, On 2019-06-20 10:52:54 -0400, Tom Lane wrote: > John Naylor writes: > > It would be nice to have confirmation to make sure I didn't err > > somewhere, and to try a more real-world benchmark. > > I don't see much wrong with using information_schema.sql as a parser/lexer > benchmark case. We

Re: benchmarking Flex practices

2019-06-20 Thread Tom Lane
John Naylor writes: > I decided to do some experiments with how we use Flex. The main > takeaway is that backtracking, which we removed in 2005, doesn't seem > to matter anymore for the core scanner. Also, state table size is of > marginal importance. Huh. That's really interesting, because

benchmarking Flex practices

2019-06-20 Thread John Naylor
I decided to do some experiments with how we use Flex. The main takeaway is that backtracking, which we removed in 2005, doesn't seem to matter anymore for the core scanner. Also, state table size is of marginal importance. Using the information_schema Flex+Bison microbenchmark from Tom [1], I