Re: [PATCH] Add CANONICAL option to xmlserialize

2024-09-12 Thread Jim Jones
On 10.09.24 19:43, Tom Lane wrote: > How about instead introducing a plain function along the lines of > "xml_canonicalize(xml, bool keep_comments) returns text" ? The SQL > committee will certainly never do that, but we won't regret having > created a plain function whenever they get around to d

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-09-10 Thread Tom Lane
Jim Jones writes: > This patch introduces the CANONICAL option to xmlserialize, which > serializes xml documents in their canonical form - as described in > the W3C Canonical XML Version 1.1 specification. This option can > be used with the additional parameter WITH [NO] COMMENTS to keep > or remo

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-09-03 Thread Jim Jones
v13 attached removes two variables that were left unused after refactoring parsenodes.h and primnodes.h, both booleans related to the INDENT feature of xmlserialize. On 30.08.24 08:05, Jim Jones wrote: > > On 30.08.24 06:46, Pavel Stehule wrote: >> >> čt 29. 8. 2024 v 23:54 odesílatel Jim Jones >>

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-29 Thread Jim Jones
On 30.08.24 06:46, Pavel Stehule wrote: > > > čt 29. 8. 2024 v 23:54 odesílatel Jim Jones > napsal: > > > > +SELECT xmlserialize(CONTENT doc AS text CANONICAL) = > > xmlserialize(CONTENT doc AS text CANONICAL WITH COMMENTS) FROM > > xmltest_serialize; > > + ?column? > > +

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-29 Thread Pavel Stehule
čt 29. 8. 2024 v 23:54 odesílatel Jim Jones napsal: > > > On 29.08.24 20:50, Pavel Stehule wrote: > > > > I know, but theoretically, there can be some benefit for CANONICAL if > > pg supports bytea there. Lot of databases still use non utf8 encoding. > > > > It is a more theoretical question - if

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-29 Thread Jim Jones
On 29.08.24 20:50, Pavel Stehule wrote: > > I know, but theoretically, there can be some benefit for CANONICAL if > pg supports bytea there. Lot of databases still use non utf8 encoding. > > It is a more theoretical question - if pg supports different types > there in future  (because SQL/XML or

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-29 Thread Pavel Stehule
út 27. 8. 2024 v 13:57 odesílatel Jim Jones napsal: > > > On 26.08.24 16:59, Pavel Stehule wrote: > > > > 1. what about behaviour of NO INDENT - the implementation is not too > > old, so it can be changed if we want (I think), and it is better to do > > early than too late > > While checking the

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-27 Thread Jim Jones
On 26.08.24 16:59, Pavel Stehule wrote: > > 1. what about behaviour of NO INDENT - the implementation is not too > old, so it can be changed if we want (I think), and it is better to do > early than too late While checking the feasibility of removing indentation with NO INDENT I may have found

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-26 Thread Pavel Stehule
po 26. 8. 2024 v 16:30 odesílatel Jim Jones napsal: > > > On 26.08.24 14:15, Pavel Stehule wrote: > > I am not strongly against enhancing XMLSERIALIZE, but it can be nice > > to see some wider concept first. Currently the state looks just random > > - and I didn't see any serious discussion about

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-26 Thread Jim Jones
On 26.08.24 14:15, Pavel Stehule wrote: > I am not strongly against enhancing XMLSERIALIZE, but it can be nice > to see some wider concept first. Currently the state looks just random > - and I didn't see any serious discussion about implementation fo > SQL/XML. We don't need to be necessarily c

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-26 Thread Pavel Stehule
po 26. 8. 2024 v 13:28 odesílatel Jim Jones napsal: > > > On 26.08.24 12:30, Pavel Stehule wrote: > > I think so there should be specified the target of CANONICAL - it is a > > partial replacement of NO INDENT or it produces format just for > > comparing? The CANONICAL format is not probably ext

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-26 Thread Jim Jones
On 26.08.24 12:30, Pavel Stehule wrote: > I think so there should be specified the target of CANONICAL - it is a > partial replacement of NO INDENT or it produces format  just for > comparing? The CANONICAL format is not probably extra standardized, > because libxml2 removes indenting, but examp

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-26 Thread Pavel Stehule
po 26. 8. 2024 v 11:32 odesílatel Jim Jones napsal: > Hi Pavel > > On 25.08.24 20:57, Pavel Stehule wrote: > > > > There is unwanted white space in the patch > > > > -<-><--><-->xmlFreeDoc(doc); > > +<->else if (format == XMLSERIALIZE_CANONICAL || format == > > XMLSERIALIZE_CANONICAL_WITH_NO_COMM

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-26 Thread Jim Jones
Hi Pavel On 25.08.24 20:57, Pavel Stehule wrote: > > There is unwanted white space in the patch > > -<-><--><-->xmlFreeDoc(doc); > +<->else if (format == XMLSERIALIZE_CANONICAL || format == > XMLSERIALIZE_CANONICAL_WITH_NO_COMMENTS) > + <>{ > +<-><-->xmlChar    *xmlbuf = NULL; > +<-><-->int      

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-25 Thread Pavel Stehule
ne 25. 8. 2024 v 20:57 odesílatel Pavel Stehule napsal: > Hi > > so 24. 8. 2024 v 7:40 odesílatel Jim Jones > napsal: > >> >> On 19.06.24 10:59, Jim Jones wrote: >> > On 09.02.24 14:19, Jim Jones wrote: >> >> v9 attached with rebase due to changes done to primnodes.h in 615f5f6 >> >> >> > v10 at

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-08-25 Thread Pavel Stehule
Hi so 24. 8. 2024 v 7:40 odesílatel Jim Jones napsal: > > On 19.06.24 10:59, Jim Jones wrote: > > On 09.02.24 14:19, Jim Jones wrote: > >> v9 attached with rebase due to changes done to primnodes.h in 615f5f6 > >> > > v10 attached with rebase due to changes in primnodes, parsenodes.h, and > > gr

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-07-09 Thread Jim Jones
On 19.06.24 10:59, Jim Jones wrote: > On 09.02.24 14:19, Jim Jones wrote: >> v9 attached with rebase due to changes done to primnodes.h in 615f5f6 >> > v10 attached with rebase due to changes in primnodes, parsenodes.h, and > gram.y > v11 attached with rebase due to changes in xml.c -- Jim From

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-06-19 Thread Jim Jones
On 09.02.24 14:19, Jim Jones wrote: > v9 attached with rebase due to changes done to primnodes.h in 615f5f6 > v10 attached with rebase due to changes in primnodes, parsenodes.h, and gram.y -- Jim From fbd98149d50fe19b886b30ed49b9d553a18f30b4 Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Wed,

Re: [PATCH] Add CANONICAL option to xmlserialize

2024-02-09 Thread Jim Jones
On 05.10.23 09:38, Jim Jones wrote: > > v8 attached changes de default behaviour to WITH COMMENTS. v9 attached with rebase due to changes done to primnodes.h in 615f5f6 -- Jim From fe51a1826b75b778c21f559236b23d340a10d703 Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Fri, 9 Feb 2024 13:51:44 +

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-10-05 Thread Jim Jones
Hi Chap On 04.10.23 23:05, Chapman Flack wrote: I hope I'm not butting in, but I too would be leery of any default behavior that's going to say thing1 and thing2 are the same thing but ignoring (name part of thing here). If that's the comparison I mean to make, and it's as easy as CANONICAL WITH

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-10-04 Thread Chapman Flack
On 2023-10-04 12:19, Jim Jones wrote: On 04.10.23 11:39, vignesh C wrote: 1) Why the default option was chosen without comments shouldn't it be the other way round? I'm not sure it is the way to go. The main idea is to check if two documents have the same content, and comments might be differen

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-10-04 Thread Jim Jones
Hi Vignesh Thanks for the thorough review! On 04.10.23 11:39, vignesh C wrote: Few comments: 1) Why the default option was chosen without comments shouldn't it be the other way round? +opt_xml_serialize_format: + INDENT { $$ = XMLSERIALIZE_INDENT;

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-10-04 Thread vignesh C
On Fri, 17 Mar 2023 at 18:01, Jim Jones wrote: > > After some more testing I realized that v5 was leaking the xmlDocPtr. > > Now fixed in v6. Few comments: 1) Why the default option was chosen without comments shouldn't it be the other way round? +opt_xml_serialize_format: +

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-09-14 Thread Thomas Munro
On Thu, Sep 14, 2023 at 11:54 PM Jim Jones wrote: > The cfbot started complaining about this patch on "macOS - Ventura - Meson" > > 'Persistent worker failed to start the task: tart isolation failed: failed to > create VM cloned from "ghcr.io/cirruslabs/macos-ventura-base:latest": tart > command

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-09-14 Thread Jim Jones
The cfbot started complaining about this patch on "macOS - Ventura - Meson" 'Persistent worker failed to start the task: tart isolation failed: failed to create VM cloned from "ghcr.io/cirruslabs/macos-ventura-base:latest": tart command returned non-zero exit code: ""' Is this a problem in m

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-17 Thread Jim Jones
After some more testing I realized that v5 was leaking the xmlDocPtr. Now fixed in v6. From d04d8fdcbedbd5ed88469bd22e079467c26ab7a4 Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Fri, 17 Mar 2023 10:23:48 +0100 Subject: [PATCH v6] Add CANONICAL output format to xmlserialize This patch introduc

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-17 Thread Jim Jones
v5 attached is a rebase over the latest changes in xmlserialize (INDENT output).From 24f045ccf7ac000a509910cb32c54ce4c91e2c33 Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Fri, 17 Mar 2023 10:23:48 +0100 Subject: [PATCH v5] Add CANONICAL output format to xmlserialize This patch introduces the C

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-14 Thread Jim Jones
v4 attached fixes an encoding issue at the xml_parse call. It now uses GetDatabaseEncoding(). Best, Jim From 3ff8e7bd9a9e43194d834ba6b125841539d5df1c Mon Sep 17 00:00:00 2001 From: Jim Jones Date: Mon, 6 Mar 2023 14:08:54 +0100 Subject: [PATCH v4] Add CANONICAL format to xmlserialize This patc

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-06 Thread Jim Jones
On 06.03.23 11:50, I wrote: I guess this confusion is happening because xml_parse() was being called with the database encoding from GetDatabaseEncoding(). I added a condition before calling xml_parse() to check if the xml document has a different encoding than UTF-8 parse_xml_decl(xml_text2

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-06 Thread Jim Jones
On 06.03.23 00:32, Thomas Munro wrote: I couldn't reproduce that locally either, but I just tested on CI with your patch applied saw the failure, and then removed "PYTHONCOERCECLOCALE=0 LANG=C" and it's all green: https://github.com/macdice/postgres/commit/91999f5d13ac2df6f7237a301ed6cf73f2bb5b

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-05 Thread Thomas Munro
On Mon, Mar 6, 2023 at 11:20 AM Jim Jones wrote: > On 05.03.23 22:00, Thomas Munro wrote: > > could be something to do with > > our environment, since .cirrus.yml sets LANG=C in the 32 bit test run > > -- maybe try that locally? > Also using LANGUAGE=C the result is the same for me - all tests pa

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-05 Thread Jim Jones
On 05.03.23 22:00, Thomas Munro wrote: The CI run for that failed in an interesting way, only on Debian + Meson, 32 bit. The diffs appear to show that psql has a different opinion of the column width, while building its header (the "--" you get at the top of psql's output), even though the a

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-03-05 Thread Thomas Munro
On Mon, Mar 6, 2023 at 7:44 AM Jim Jones wrote: > The attached version includes documentation and tests to the patch. The CI run for that failed in an interesting way, only on Debian + Meson, 32 bit. The diffs appear to show that psql has a different opinion of the column width, while building i

[PATCH] Add CANONICAL option to xmlserialize

2023-03-05 Thread Jim Jones
On 27.02.23 14:16, I wrote: Hi, In order to compare pairs of XML documents for equivalence it is necessary to convert them first to their canonical form, as described at W3C Canonical XML 1.1.[1] This spec basically defines a standard physical representation of xml documents that have more th