[lxml] Re: Turn three-line block into single?

2022-08-10 Thread Gilles
On 10/08/2022 16:34, Stefan Behnel wrote: Gilles schrieb am 10.08.22 um 15:20: for row in tree.iter("wpt"):      lat,lon = row.attrib.values() Note that this assignment depends on the order of the two attributes in the XML document, i.e. in data that you may not control yourself

[lxml] Re: Turn three-line block into single?

2022-08-10 Thread Gilles
On 10/08/2022 13:30, Charlie Clark wrote: Yes, this should work. However, I don't know if adjusting the tree while looping over it won't the same kind of problems as with other sequences in Python. How many elements are there in your tree? Memory use in XML can get very expensive so combining

[lxml] Re: Turn three-line block into single?

2022-08-10 Thread Gilles
On 09/08/2022 20:59, Gilles wrote: On 09/08/2022 10:51, Charlie Clark wrote: Though, to be honest I suspect writing to a Sqlite database and exporting unique values back to XML is probably going to be easier. I found another way, without relying on SQLite: === parser = et.XMLParser(re

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
On 09/08/2022 10:51, Charlie Clark wrote: Though, to be honest I suspect writing to a Sqlite database and exporting unique values back to XML is probably going to be easier. I found another way, without relying on SQLite: === parser = et.XMLParser(remove_blank_text=True) tree = et.

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
Thanks for the tip. On 09/08/2022 17:49, Majewski, Steven Dennis (sdm7g) wrote: You can also do this maybe more simply in XQuery. In that case, you may want to remove any whitespace differences on ingest ( or else, use normalize-space() in comparisons ) [ In BaseX, there is an option to strip

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
Thank you. On 09/08/2022 15:56, Charlie Clark wrote: On 9 Aug 2022, at 15:16, Gilles wrote: Here's some working code. I recon using SQL's UNIQUE and ignoring the error triggered when adding a duplicate is a bit kludgy, but it works For the task I don't see the ne

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
On 08/08/2022 22:08, Majewski, Steven Dennis (sdm7g) wrote: Add options:  method=‘c14n2’, strip_text=True When you serialize the output. ( pretty_print should also be the default False ) >>> print(etree.tostring(etree.fromstring(ss),method='c14n2', strip_text=True)) b'blah' Thank you.___

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
On 09/08/2022 11:40, Charlie Clark wrote: On 9 Aug 2022, at 11:09, Gilles wrote: Nice idea too. I could just ignore the error when trying to insert a duplicate https://www.sqlitetutorial.net/sqlite-unique-constraint/ Sure, though that's a kind of try/except and if you have a lot of d

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
On 09/08/2022 11:40, Charlie Clark wrote: On 9 Aug 2022, at 11:09, Gilles wrote: Nice idea too. I could just ignore the error when trying to insert a duplicate https://www.sqlitetutorial.net/sqlite-unique-constraint/ Sure, though that's a kind of try/except and if you have a lot of d

[lxml] Re: Turn three-line block into single?

2022-08-09 Thread Gilles
On 09/08/2022 10:51, Charlie Clark wrote: Though, to be honest I suspect writing to a Sqlite database and exporting unique values back to XML is probably going to be easier. Nice idea too. I could just ignore the error when trying to insert a duplicate https://www.sqlitetutorial.net/sqlite-u

[lxml] Re: Turn three-line block into single?

2022-08-08 Thread Gilles
tribute 'getchildren' """ print(f"type(entries.children = {','.join(str(type(c)) for c in entries.getchildren())}") On 09/08/2022 00:55, Adrian Bool wrote: Hi Gilles, I guess you're intending on using 'sort -u' on your data?  An alterna

[lxml] Turn three-line block into single?

2022-08-08 Thread Gilles
Hello, Before I  resort to a regex, I figured I should ask here. To find and remove possible duplicates, I need to turn each block into a single line: FROM       blah   TO   blah Do you know of a way to do this in lxml? Thank you. ___ lxml -

[lxml] Re: Why does it fail cleaning GPX file?

2022-07-22 Thread Gilles
On 22/07/2022 10:00, holger.jo...@lbbw.de wrote: The reason is that lxml (sensibly) uses fully qualified tag names in Clark notation (see http://www.jclark.com/xml/xmlns.htm) Thank much for the infos + example. ___ lxml - The Python XML Toolkit mai

[lxml] Why does it fail cleaning GPX file?

2022-07-21 Thread Gilles
Hello, I run this script to remove unneeded elements. For some reason, the input file is left as-is, when I try to get rid of the block; If works as expected when I ignore that element. Any idea why? Thank you. == INPUT.GPX http://www.acme.com"; xmlns="http://www.topografix.com/

[lxml] Re: : Re: getparent() fails with "AttributeError: 'list' object has no attribute 'getparent' "

2022-05-30 Thread Gilles
Thank you, and to Jens too. The newbie in me (wrongly) assumed that xpath() returned a pointer to the element if only one was found in the HTML file, like find(). On 30/05/2022 09:30, holger.jo...@lbbw.de wrote: Through trial and error, it looks like xpath() returns an array, even if only on

[lxml] Re: [HTML] How to get text of attribute?

2022-05-28 Thread Gilles
Even better! Thanks. On 28/05/2022 12:28, Pedro Andres Aranda Gutierrez wrote: Maybe even easier: string() in the XPATH: === description = tree.xpath('string(//meta[@name="description"]/@content)') print(description) === best, /PA On Fri, 27 May 2022 at 19:03, wrote: For others' benefi

[lxml] [HTML] How to get text of attribute?

2022-05-27 Thread Gilles
Hello, My XPath skills being what they are… in an HTML file, I can't figure out how to grab the text of the second attribute in a meta element: == with open("input.html") as tempfile:     parser = et.HTMLParser(remove_blank_text=True,recover=True)     currfile_tree = et.parse(

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-17 Thread Gilles
Sure, I could remove all CRLF's in paragaphs that don't end with a like they should, although I'm happy with just using a file handle instead. What's MWE? On 17/05/2022 09:16, Pedro Andres Aranda Gutierrez wrote: Hi Gilles, just FYI, you will need to do the cleaning befo

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-15 Thread Gilles
gSur Those are really annoying Best, /PA On Fri, 13 May 2022 at 12:47, Gilles wrote: On 12/05/2022 22:32, Adrian Bool wrote: On 12 May 2022, at 10:26, Gilles wrote:   File "src\lxml\parser.pxi", line 652, in lxml.etree._raiseParseError

[lxml] Re: Adding block of HTML?

2022-05-13 Thread Gilles
On 13/05/2022 12:51, Xavier Morel wrote: You're parsing an HTML document. An HTML document necessarily has a root and a body, so that's part of the error recovery of HTML parsers. If you don't want to parse an HTML document, you should probably use `fragment_fromstring`. Thanks for the po

[lxml] Re: Adding block of HTML?

2022-05-13 Thread Gilles
On 12/05/2022 22:19, Adrian Bool wrote: On 12 May 2022, at 13:08, Gilles wrote: 1. Is there a way to tell lxml _not_ to add and when inserting the header right after ? You're not loading the header data with the HTMLParser are you?? It is the HTMLParser that adds etc. Just pars

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-13 Thread Gilles
On 12/05/2022 22:32, Adrian Bool wrote: On 12 May 2022, at 10:26, Gilles wrote:   File "src\lxml\parser.pxi", line 652, in lxml.etree._raiseParseError OSError: Error reading file*'* Look at the last line above - you're giving parse() a string containing XML data which

[lxml] Re: Adding block of HTML?

2022-05-12 Thread Gilles
On 12/05/2022 09:25, Adrian Bool wrote: More XML fun in the morning! Almost there 1. Is there a way to tell lxml _not_ to add and when inserting the header right after ? [header here]My title Here's the code: == body = root.find("body") if len(body) == 0:     raise Exception(" not f

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-12 Thread Gilles
On 12/05/2022 11:47, Charlie Clark wrote: On 12 May 2022, at 11:26, Gilles wrote: →   tree = et.parse(StringIO(content), parser) Why StringIO? XML should always be bytes but there also shouldn't be a need to convert what you've read from the file. I don't know. I'm

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-12 Thread Gilles
On 12/05/2022 08:45, Adrian Bool wrote: Sure, I just tend to use pathlib for all my file handling as its really useful and has been part of Python's standard library for a good while now — so no extra package to install. Good to know. I'll use pathlib from now on, as well as avoid reading the

[lxml] Re: Adding block of HTML?

2022-05-12 Thread Gilles
n index of 0 to insert our content as the first # child element of the wrapper's body element... wrapper_body.insert(index=0, element=content_root) print(et.tostring(wrapper_root, pretty_print=True).decode('utf8')) On 11 May 2022, at 11:59, Gilles wrote: Hello, I need to add ~twent

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Gilles
Tuens out there's no need to use the pathlib module: The issue with " " is gone when 1) first reading HTML into a variable 2) before parsing it, even with the standard open(): """ OK from pathlib import Path with Path(f).open() as tempfile:     tree = et.parse(tempfile, parser=pars

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Gilles
On 11/05/2022 13:57, Paul Higgs wrote: Without looking at the tidtly source, I would expect that it is looking for closing tags, I. E. Thanks for the tip. Tidy still reports an error with this: ==     == Using "-ashtml" solved the issue. Thanks!

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Gilles
On 11/05/2022 12:19, Charlie Clark wrote: It could always be a bug, but really we need a sample file to test. Which version of lxml and Python are you using? Here it is: https://we.tl/t-WowFCDBp5A Python 3.8.8, lxml 4.6.3.0 But, if all you want is pretty printing then I recommend simply usin

[lxml] Adding block of HTML?

2022-05-11 Thread Gilles
Hello, I need to add ~twenty lines of HTML right after the tag. Does lxml provide a way to read that data from a variable, to keep things simple? for body in root.xpath('//body[@*]'):     et.SubElement(body,"",HTML_block) Thank you. __

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Gilles
Adrian: Thanks for the code. The output is now correct. Am I using lxml incorrectly, or is it some issue with its HTML parser? Can I do without using an extra package (Path.pathlib)? Charlie Clark : The output from "et.tostring()" has " " added before each carriage return (which is 0D0A since

[lxml] [newbie] Easier way to check if element exists, edit if exists, insert if not?

2021-09-16 Thread Gilles
Hello, Out of curiosity, is there a more compact way to check if an element exists, edit if it does, insert if it doesn't ? #=== Edit/insert top-most name r = root.find('./Document/name') #if et.iselement(r) if r is not None:     print("Name exists: ",r.text)     r.text = BASENAME else

[lxml] Re: [newbie] Why can't find elements? What's the difference between // and .//?

2021-09-13 Thread Gilles
On 12/09/2021 23:31, Stefan Behnel wrote: Am 12. September 2021 19:14:40 MESZ schrieb Gilles: tracks = root.findall('.//snipet|.//LookAt|.//Style|.//StyleMap') Just use root.iter('snippet', 'LookAt', 'Style', 'StyleMap') It's cer

[lxml] Re: [newbie] Why can't find elements? What's the difference between // and .//?

2021-09-12 Thread Gilles
Found it: findall() simply doesn't support that syntax, while xpath() works: #BAD for el in root.findall('./snippet/*|./LookAt/*|./Style/*|./StyleMap/*'): for el in root.xpath('.//snippet/*|.//LookAt/*|.//Style/*|.//StyleMap/*'):         print(el.tag, el.text) On

[lxml] [newbie] Why can't find elements? What's the difference between // and .//?

2021-09-12 Thread Gilles
Hello, A couple more newbie questions: 1. Using findall or xpath, what's the difference between "//" and ".//"? I see both in examples. https://www.w3schools.com/xml/xpath_syntax.asp 2. Why does the following script fail finding the elements I wish to remove from the tree? ***

[lxml] Re: [newbie] Removing all elements that match string?

2021-07-30 Thread Gilles
root.xpath(f'.//{tag}')     for item in items:     if item.text == text:     parent = item.getparent()     parent.remove(item)     return root # -- Also look at element.iter, element.iterdescendants, etc. Dave On Fri 30 Jul 2021 11:02:59 AM PDT, Gil

[lxml] [newbie] Removing all elements that match string?

2021-07-30 Thread Gilles
Hello, I'm only getting started with (l)xml, and was curious to know if someone had an example of how to remove all items in an XML file that match the following string: ^\t*100.25\r\n Thank you. ___ lxml - The Python XML Toolkit mailing list -- l