Re: [PATCH] Optimize json_lex_string by batching character copying

2022-09-01 Thread John Naylor
On Wed, Aug 31, 2022 at 11:17 AM Nathan Bossart wrote: > > On Wed, Aug 31, 2022 at 10:50:39AM +0700, John Naylor wrote: > > Here's the final piece. I debated how many tests to add and decided it > > was probably enough to add one each for checking quotes and > > backslashes in the fast path. There

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-30 Thread Nathan Bossart
On Wed, Aug 31, 2022 at 10:50:39AM +0700, John Naylor wrote: > Here's the final piece. I debated how many tests to add and decided it > was probably enough to add one each for checking quotes and > backslashes in the fast path. There is one cosmetic change in the > code: Before, the vectorized less

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-30 Thread John Naylor
On Tue, Aug 23, 2022 at 1:03 PM John Naylor wrote: > > LGTM overall. My plan is to split out the json piece, adding tests for > that, and commit the infrastructure for it fairly soon. Here's the final piece. I debated how many tests to add and decided it was probably enough to add one each for ch

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-29 Thread Jelte Fennema
> I wonder why copyable_characters_length is not reset after flushing. It's not necessary because of the break statement right after. But this part of the code was refactored away in John's improved patch that's actually merged:  https://github.com/postgres/postgres/commit/3838fa269c15706df2b85c

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-26 Thread John Naylor
On Thu, Aug 25, 2022 at 1:35 PM John Naylor wrote: > > I think I'll go ahead and commit 0001 in a couple days pending further > comments. Pushed with Nathan's correction and some cosmetic rearrangements. -- John Naylor EDB: http://www.enterprisedb.com

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-25 Thread John Naylor
On Fri, Aug 26, 2022 at 10:14 AM Nathan Bossart wrote: > > > +test_lfind8_internal(uint8 key) [...] > > + elog(ERROR, "pg_lfind8() found nonexistent element <= > > '0x%x'", key + 1); > > +} > > nitpick: Shouldn't the elog() calls use "==" instead of "<=" for this one? Good catch, wil

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-25 Thread Nathan Bossart
On Thu, Aug 25, 2022 at 01:35:45PM +0700, John Naylor wrote: > - For the following comment, pgindent will put spaced operands on a > separate line which is not great for readability. and our other > reference to the Stanford bithacks page keeps the in-page link, and I > see no reason to exclude it

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-24 Thread John Naylor
On Wed, Aug 24, 2022 at 11:56 PM Nathan Bossart wrote: > > On Wed, Aug 24, 2022 at 11:59:25AM +0700, John Naylor wrote: > > It seems "scalar" would be a bad choice since it already means > > (confusingly) operating on the least significant element of a vector. > > I'm thinking of *_has and *_has_l

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-24 Thread Nathan Bossart
On Wed, Aug 24, 2022 at 11:59:25AM +0700, John Naylor wrote: > It seems "scalar" would be a bad choice since it already means > (confusingly) operating on the least significant element of a vector. > I'm thinking of *_has and *_has_le, matching the already existing in > the earlier patch *_has_zero

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-23 Thread John Naylor
On Wed, Aug 24, 2022 at 12:15 AM Nathan Bossart wrote: > > On Tue, Aug 23, 2022 at 01:03:03PM +0700, John Naylor wrote: > > On Tue, Aug 23, 2022 at 10:32 AM Nathan Bossart > >> Here's a new version of the patch with the 32-bit changes and calls to > >> lfind() removed. > > > > LGTM overall. My pla

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-23 Thread Nathan Bossart
On Tue, Aug 23, 2022 at 01:03:03PM +0700, John Naylor wrote: > On Tue, Aug 23, 2022 at 10:32 AM Nathan Bossart >> Here's a new version of the patch with the 32-bit changes and calls to >> lfind() removed. > > LGTM overall. My plan is to split out the json piece, adding tests for > that, and commit

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-22 Thread John Naylor
On Tue, Aug 23, 2022 at 10:32 AM Nathan Bossart wrote: > > On Mon, Aug 22, 2022 at 02:22:29PM -0700, Nathan Bossart wrote: > > On Mon, Aug 22, 2022 at 09:35:34AM +0700, John Naylor wrote: > >> Not at all! However, the 32-bit-element changes are irrelevant for > >> json, and make review more diffic

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-22 Thread Nathan Bossart
On Mon, Aug 22, 2022 at 02:22:29PM -0700, Nathan Bossart wrote: > On Mon, Aug 22, 2022 at 09:35:34AM +0700, John Naylor wrote: >> Not at all! However, the 32-bit-element changes are irrelevant for >> json, and make review more difficult. I would suggest keeping those in >> the other thread starting

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-22 Thread Nathan Bossart
On Mon, Aug 22, 2022 at 09:35:34AM +0700, John Naylor wrote: > Not at all! However, the 32-bit-element changes are irrelevant for > json, and make review more difficult. I would suggest keeping those in > the other thread starting with whatever refactoring is needed. I can > always rebase over that

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-21 Thread John Naylor
On Sun, Aug 21, 2022 at 12:47 PM Nathan Bossart wrote: > > I spent some more time looking at this one, and I had a few ideas that I > thought I'd share. 0001 is your v6 patch with a few additional changes, > including simplying the assertions for readability, splitting out the > Vector type into

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-20 Thread Nathan Bossart
On Fri, Aug 19, 2022 at 01:42:15PM -0700, Nathan Bossart wrote: > On Fri, Aug 19, 2022 at 03:11:36PM +0700, John Naylor wrote: >> This is done. Also: >> - a complete overhaul of the pg_lfind8* tests >> - using a typedef for the vector type >> - some refactoring, name changes and other cleanups (a f

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-19 Thread Nathan Bossart
On Fri, Aug 19, 2022 at 03:11:36PM +0700, John Naylor wrote: > This is done. Also: > - a complete overhaul of the pg_lfind8* tests > - using a typedef for the vector type > - some refactoring, name changes and other cleanups (a few of these > could also be applied to the 32-byte element path, but t

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-19 Thread John Naylor
On Tue, Aug 16, 2022 at 4:23 AM Nathan Bossart wrote: > > On Mon, Aug 15, 2022 at 08:33:21PM +0700, John Naylor wrote: > > +#ifdef USE_SSE2 > > + chunk = _mm_loadu_si128((const __m128i *) &base[i]); > > +#else > > + memcpy(&chunk, &base[i], sizeof(chunk)); > > +#endif

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-15 Thread Nathan Bossart
On Mon, Aug 15, 2022 at 08:33:21PM +0700, John Naylor wrote: > The attached implements the above, more or less, using new pg_lfind8() > and pg_lfind8_le(), which in turn are based on helper functions that > act on a single vector. The pg_lfind* functions have regression tests, > but I haven't done

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-15 Thread Ranier Vilela
Em seg., 15 de ago. de 2022 às 15:34, Ranier Vilela escreveu: > Hi, > > I ran this test. > > DROP TABLE IF EXISTS long_json_as_text; > CREATE TABLE long_json_as_text AS > with long as ( > select repeat(description, 11) > from pg_description > ) > select (select json_agg(row_to_json(long))::text a

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-15 Thread Ranier Vilela
Hi, I ran this test. DROP TABLE IF EXISTS long_json_as_text; CREATE TABLE long_json_as_text AS with long as ( select repeat(description, 11) from pg_description ) select (select json_agg(row_to_json(long))::text as t from long) from generate_series(1, 100); VACUUM FREEZE long_json_as_text; selec

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-08-15 Thread John Naylor
I wrote > On Mon, Jul 11, 2022 at 11:07 PM Andres Freund wrote: > > > I wonder if we can add a somewhat more general function for scanning until > > some characters are found using SIMD? There's plenty other places that could > > be useful. > > In simple cases, we could possibly abstract the enti

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-13 Thread Andrew Dunstan
On 2022-06-24 Fr 20:18, Andres Freund wrote: > Hi, > > On 2022-06-24 08:47:09 +, Jelte Fennema wrote: >> To test performance of this change I used COPY BINARY from a JSONB table >> into another, containing fairly JSONB values of ~15kB. > This will have a lot of other costs included (DML is ex

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-11 Thread John Naylor
On Mon, Jul 11, 2022 at 11:07 PM Andres Freund wrote: > I wonder if we can add a somewhat more general function for scanning until > some characters are found using SIMD? There's plenty other places that could > be useful. In simple cases, we could possibly abstract the entire loop. With this pa

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-11 Thread Andres Freund
Hi, On 2022-07-11 11:53:26 -0400, Tom Lane wrote: > Andres Freund writes: > > I wonder if we can't abstract this at least a bit better. If we go that > > route > > a bit further, then add another arch, this code will be pretty much > > unreadable. > > IMO, it's pretty unreadable *now*, for lack

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-11 Thread Tom Lane
Andres Freund writes: > I wonder if we can't abstract this at least a bit better. If we go that route > a bit further, then add another arch, this code will be pretty much > unreadable. IMO, it's pretty unreadable *now*, for lack of comments about what it's doing and why.

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-11 Thread Andres Freund
Hi, On 2022-07-06 15:58:44 +0700, John Naylor wrote: > From 82e13b6bebd85a152ededcfd75495c0c0f642354 Mon Sep 17 00:00:00 2001 > From: John Naylor > Date: Wed, 6 Jul 2022 15:50:09 +0700 > Subject: [PATCH v4 4/4] Use vectorized lookahead in json_lex_string on x86 > > --- > src/common/jsonapi.c |

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-10 Thread John Naylor
On Fri, Jul 8, 2022 at 3:06 PM John Naylor wrote: > > I've pushed 0001 (although the email seems to have been swallowed > again), and pending additional comments on 0002 and 0003 I'll squash > and push those next week. This is done. > 0004 needs some thought on integrating with > symbols we disc

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-08 Thread John Naylor
I've pushed 0001 (although the email seems to have been swallowed again), and pending additional comments on 0002 and 0003 I'll squash and push those next week. 0004 needs some thought on integrating with symbols we discover during configure. -- John Naylor EDB: http://www.enterprisedb.com

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-06 Thread John Naylor
On Wed, Jul 6, 2022 at 12:18 PM Andres Freund wrote: > I think before committing something along those lines we should make the > relevant bits also be applicable when ->strval is NULL, as several functions > use that (notably json_in IIRC). Afaics we'd just need to move the strval > check to be

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-05 Thread Andres Freund
Hi, On 2022-07-06 12:10:20 +0700, John Naylor wrote: > diff --git a/src/common/jsonapi.c b/src/common/jsonapi.c > index eeedc0645a..ad4858c623 100644 > --- a/src/common/jsonapi.c > +++ b/src/common/jsonapi.c > @@ -851,10 +851,26 @@ json_lex_string(JsonLexContext *lex) > } >

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-07-05 Thread John Naylor
On Sat, Jun 25, 2022 at 8:05 AM Andres Freund wrote: > > I tried your patch with: > > > > DROP TABLE IF EXISTS json_as_text; > > CREATE TABLE json_as_text AS SELECT (SELECT json_agg(row_to_json(pd)) as t > > FROM pg_description pd) FROM generate_series(1, 100); > > VACUUM FREEZE json_as_text; >

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-06-24 Thread Andres Freund
Hi, On 2022-06-24 17:18:10 -0700, Andres Freund wrote: > On 2022-06-24 08:47:09 +, Jelte Fennema wrote: > > To test performance of this change I used COPY BINARY from a JSONB table > > into another, containing fairly JSONB values of ~15kB. > > This will have a lot of other costs included (DML

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-06-24 Thread Andres Freund
Hi, On 2022-06-24 08:47:09 +, Jelte Fennema wrote: > To test performance of this change I used COPY BINARY from a JSONB table > into another, containing fairly JSONB values of ~15kB. This will have a lot of other costs included (DML is expensive). I'd suggest storing the json in a text column

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-06-24 Thread Jelte Fennema
> +           if (copyable_characters_length) > +           { > +               /* flush copyable characters */ > +               appendBinaryStringInfo( > +                                      lex->strval, > +                                      s - copyable_characters_length, > +              

Re: [PATCH] Optimize json_lex_string by batching character copying

2022-06-24 Thread Zhihong Yu
Hi, Looking at the patch, + if (copyable_characters_length) + { + /* flush copyable characters */ + appendBinaryStringInfo( + lex->strval, + s - copyable_characters_length, +

[PATCH] Optimize json_lex_string by batching character copying

2022-06-24 Thread Jelte Fennema
When parsing JSON strings need to be converted from the JSON string format to a c-style string. A simple copy of the buffer does not suffice because of the various escape sequences that that JSON supports. Because of this our JSON parser wrote characters into the c-style string buffer one at a time