[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
On 09/08/2022 10:51, Charlie Clark wrote: Though, to be honest I suspect writing to a Sqlite database and exporting unique values back to XML is probably going to be easier. I found another way, without relying on SQLite: === parser = et.XMLParser(remove_blank_text=True) tree = et.

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
Thanks for the tip. On 09/08/2022 17:49, Majewski, Steven Dennis (sdm7g) wrote: You can also do this maybe more simply in XQuery. In that case, you may want to remove any whitespace differences on ingest ( or else, use normalize-space() in comparisons ) [ In BaseX, there is an option to strip

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
Thank you. On 09/08/2022 15:56, Charlie Clark wrote: On 9 Aug 2022, at 15:16, Gilles wrote: Here's some working code. I recon using SQL's UNIQUE and ignoring the error triggered when adding a duplicate is a bit kludgy, but it works For the task I don't see the need for any kind of

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Charlie Clark
On 9 Aug 2022, at 15:16, Gilles wrote: Here's some working code. I recon using SQL's UNIQUE and ignoring the error triggered when adding a duplicate is a bit kludgy, but it works For the task I don't see the need for any kind of keys, they'll just slow things down. Also, it will be faster u

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
On 08/08/2022 22:08, Majewski, Steven Dennis (sdm7g) wrote: Add options:  method=‘c14n2’, strip_text=True When you serialize the output. ( pretty_print should also be the default False ) >>> print(etree.tostring(etree.fromstring(ss),method='c14n2', strip_text=True)) b'blah' Thank you.___

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
On 09/08/2022 11:40, Charlie Clark wrote: On 9 Aug 2022, at 11:09, Gilles wrote: Nice idea too. I could just ignore the error when trying to insert a duplicate https://www.sqlitetutorial.net/sqlite-unique-constraint/ Sure, though that's a kind of try/except and if you have a lot of data I su

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
On 09/08/2022 11:40, Charlie Clark wrote: On 9 Aug 2022, at 11:09, Gilles wrote: Nice idea too. I could just ignore the error when trying to insert a duplicate https://www.sqlitetutorial.net/sqlite-unique-constraint/ Sure, though that's a kind of try/except and if you have a lot of data I su

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Charlie Clark
On 9 Aug 2022, at 11:09, Gilles wrote: > Nice idea too. I could just ignore the error when trying to insert a duplicate > > https://www.sqlitetutorial.net/sqlite-unique-constraint/ Sure, though that's a kind of try/except and if you have a lot of data I suspect the aggregate function will be fas

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
On 09/08/2022 10:51, Charlie Clark wrote: Though, to be honest I suspect writing to a Sqlite database and exporting unique values back to XML is probably going to be easier. Nice idea too. I could just ignore the error when trying to insert a duplicate https://www.sqlitetutorial.net/sqlite-u

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Charlie Clark
On 9 Aug 2022, at 8:40, Gilles wrote: Thanks mucho. The script fails on this particular line: """   File "remove.dups.py", line 54, in     print(f"type(entries.children = {','.join(str(type(c)) for c in entries.getchildren())}") AttributeError: 'NoneType' object has no attribute 'getchild