On 10/08/2022 16:34, Stefan Behnel wrote:
Gilles schrieb am 10.08.22 um 15:20:
for row in tree.iter("wpt"):
lat,lon = row.attrib.values()
Note that this assignment depends on the order of the two attributes
in the XML document, i.e. in data that you may not control yourself
On 10/08/2022 13:30, Charlie Clark wrote:
Yes, this should work. However, I don't know if adjusting the tree
while looping over it won't the same kind of problems as with other
sequences in Python.
How many elements are there in your tree? Memory use in XML can get very
expensive so combining
On 09/08/2022 20:59, Gilles wrote:
On 09/08/2022 10:51, Charlie Clark wrote:
Though, to be honest I suspect writing to a Sqlite database and
exporting unique values back to XML is probably going to be easier.
I found another way, without relying on SQLite:
===
parser = et.XMLParser(re
On 09/08/2022 10:51, Charlie Clark wrote:
Though, to be honest I suspect writing to a Sqlite database and
exporting unique values back to XML is probably going to be easier.
I found another way, without relying on SQLite:
===
parser = et.XMLParser(remove_blank_text=True)
tree = et.
Thanks for the tip.
On 09/08/2022 17:49, Majewski, Steven Dennis (sdm7g) wrote:
You can also do this maybe more simply in XQuery.
In that case, you may want to remove any whitespace differences on ingest ( or
else, use normalize-space() in comparisons ) [ In BaseX, there is an option to
strip
Thank you.
On 09/08/2022 15:56, Charlie Clark wrote:
On 9 Aug 2022, at 15:16, Gilles wrote:
Here's some working code. I recon using SQL's UNIQUE and ignoring
the error triggered when adding a duplicate is a bit kludgy, but
it works
For the task I don't see the ne
On 08/08/2022 22:08, Majewski, Steven Dennis (sdm7g) wrote:
Add options: method=‘c14n2’, strip_text=True
When you serialize the output.
( pretty_print should also be the default False )
>>> print(etree.tostring(etree.fromstring(ss),method='c14n2',
strip_text=True))
b'blah'
Thank you.___
On 09/08/2022 11:40, Charlie Clark wrote:
On 9 Aug 2022, at 11:09, Gilles wrote:
Nice idea too. I could just ignore the error when trying to insert a duplicate
https://www.sqlitetutorial.net/sqlite-unique-constraint/
Sure, though that's a kind of try/except and if you have a lot of d
On 09/08/2022 11:40, Charlie Clark wrote:
On 9 Aug 2022, at 11:09, Gilles wrote:
Nice idea too. I could just ignore the error when trying to insert a duplicate
https://www.sqlitetutorial.net/sqlite-unique-constraint/
Sure, though that's a kind of try/except and if you have a lot of d
On 09/08/2022 10:51, Charlie Clark wrote:
Though, to be honest I suspect writing to a Sqlite database and
exporting unique values back to XML is probably going to be easier.
Nice idea too. I could just ignore the error when trying to insert a
duplicate
https://www.sqlitetutorial.net/sqlite-u
tribute 'getchildren'
"""
print(f"type(entries.children = {','.join(str(type(c)) for c in
entries.getchildren())}")
On 09/08/2022 00:55, Adrian Bool wrote:
Hi Gilles,
I guess you're intending on using 'sort -u' on your data? An
alterna
Hello,
Before I resort to a regex, I figured I should ask here.
To find and remove possible duplicates, I need to turn each block into a
single line:
FROM
blah
TO
blah
Do you know of a way to do this in lxml?
Thank you.
___
lxml -
On 22/07/2022 10:00, holger.jo...@lbbw.de wrote:
The reason is that lxml (sensibly) uses fully qualified tag names in
Clark notation
(see http://www.jclark.com/xml/xmlns.htm)
Thank much for the infos + example.
___
lxml - The Python XML Toolkit mai
Hello,
I run this script to remove unneeded elements.
For some reason, the input file is left as-is, when I try to get rid of
the block; If works as expected when I ignore that element.
Any idea why?
Thank you.
== INPUT.GPX
http://www.acme.com";
xmlns="http://www.topografix.com/
Thank you, and to Jens too.
The newbie in me (wrongly) assumed that xpath() returned a pointer to
the element if only one was found in the HTML file, like find().
On 30/05/2022 09:30, holger.jo...@lbbw.de wrote:
Through trial and error, it looks like xpath() returns an array, even if only
on
Even better!
Thanks.
On 28/05/2022 12:28, Pedro Andres Aranda Gutierrez wrote:
Maybe even easier: string() in the XPATH:
===
description = tree.xpath('string(//meta[@name="description"]/@content)')
print(description)
===
best, /PA
On Fri, 27 May 2022 at 19:03, wrote:
For others' benefi
Hello,
My XPath skills being what they are… in an HTML file, I can't figure out
how to grab the text of the second attribute in a meta element:
==
with open("input.html") as tempfile:
parser = et.HTMLParser(remove_blank_text=True,recover=True)
currfile_tree = et.parse(
Sure, I could remove all CRLF's in paragaphs that don't end with a
like they should, although I'm happy with just using a file handle instead.
What's MWE?
On 17/05/2022 09:16, Pedro Andres Aranda Gutierrez wrote:
Hi Gilles,
just FYI, you will need to do the cleaning befo
gSur
Those
are really annoying
Best, /PA
On Fri, 13 May 2022 at 12:47, Gilles wrote:
On 12/05/2022 22:32, Adrian Bool wrote:
On 12 May 2022, at 10:26, Gilles wrote:
File "src\lxml\parser.pxi", line 652, in
lxml.etree._raiseParseError
On 13/05/2022 12:51, Xavier Morel wrote:
You're parsing an HTML document. An HTML document necessarily has a
root and a body, so that's part of the error recovery of HTML
parsers.
If you don't want to parse an HTML document, you should probably use
`fragment_fromstring`.
Thanks for the po
On 12/05/2022 22:19, Adrian Bool wrote:
On 12 May 2022, at 13:08, Gilles wrote:
1. Is there a way to tell lxml _not_ to add and when
inserting the header right after ?
You're not loading the header data with the HTMLParser are you?? It is the
HTMLParser that adds etc. Just pars
On 12/05/2022 22:32, Adrian Bool wrote:
On 12 May 2022, at 10:26, Gilles wrote:
File "src\lxml\parser.pxi", line 652, in lxml.etree._raiseParseError
OSError: Error reading file*'*
Look at the last line above - you're giving parse() a string
containing XML data which
On 12/05/2022 09:25, Adrian Bool wrote:
More XML fun in the morning!
Almost there
1. Is there a way to tell lxml _not_ to add and
when inserting the header right after ?
[header here]My title
Here's the code:
==
body = root.find("body")
if len(body) == 0:
raise Exception(" not f
On 12/05/2022 11:47, Charlie Clark wrote:
On 12 May 2022, at 11:26, Gilles wrote:
→ tree = et.parse(StringIO(content), parser)
Why StringIO? XML should always be bytes but there also shouldn't be a need to
convert what you've read from the file.
I don't know. I'm
On 12/05/2022 08:45, Adrian Bool wrote:
Sure, I just tend to use pathlib for all my file handling as its
really useful and has been part of Python's standard library for a
good while now — so no extra package to install.
Good to know. I'll use pathlib from now on, as well as avoid reading the
n index of 0 to insert our content as the first
# child element of the wrapper's body element...
wrapper_body.insert(index=0, element=content_root)
print(et.tostring(wrapper_root, pretty_print=True).decode('utf8'))
On 11 May 2022, at 11:59, Gilles wrote:
Hello,
I need to add ~twent
Tuens out there's no need to use the pathlib module: The issue with
"
" is gone when 1) first reading HTML into a variable 2) before
parsing it, even with the standard open():
""" OK
from pathlib import Path
with Path(f).open() as tempfile:
tree = et.parse(tempfile, parser=pars
On 11/05/2022 13:57, Paul Higgs wrote:
Without looking at the tidtly source, I would expect that it is
looking for closing tags, I. E.
Thanks for the tip.
Tidy still reports an error with this:
==
==
Using "-ashtml" solved the issue. Thanks!
On 11/05/2022 12:19, Charlie Clark wrote:
It could always be a bug, but really we need a sample file to test.
Which version of lxml and Python are you using?
Here it is:
https://we.tl/t-WowFCDBp5A
Python 3.8.8, lxml 4.6.3.0
But, if all you want is pretty printing then I recommend simply usin
Hello,
I need to add ~twenty lines of HTML right after the tag.
Does lxml provide a way to read that data from a variable, to keep
things simple?
for body in root.xpath('//body[@*]'):
et.SubElement(body,"",HTML_block)
Thank you.
__
Adrian: Thanks for the code. The output is now correct. Am I using lxml
incorrectly, or is it some issue with its HTML parser? Can I do without
using an extra package (Path.pathlib)?
Charlie Clark : The output from "et.tostring()" has "
" added before
each carriage return (which is 0D0A since
Hello,
Out of curiosity, is there a more compact way to check if an element
exists, edit if it does, insert if it doesn't ?
#=== Edit/insert top-most name
r = root.find('./Document/name')
#if et.iselement(r)
if r is not None:
print("Name exists: ",r.text)
r.text = BASENAME
else
On 12/09/2021 23:31, Stefan Behnel wrote:
Am 12. September 2021 19:14:40 MESZ schrieb Gilles:
tracks = root.findall('.//snipet|.//LookAt|.//Style|.//StyleMap')
Just use
root.iter('snippet', 'LookAt', 'Style', 'StyleMap')
It's cer
Found it: findall() simply doesn't support that syntax, while xpath() works:
#BAD for el in
root.findall('./snippet/*|./LookAt/*|./Style/*|./StyleMap/*'):
for el in root.xpath('.//snippet/*|.//LookAt/*|.//Style/*|.//StyleMap/*'):
print(el.tag, el.text)
On
Hello,
A couple more newbie questions:
1. Using findall or xpath, what's the difference between "//" and ".//"?
I see both in examples.
https://www.w3schools.com/xml/xpath_syntax.asp
2. Why does the following script fail finding the elements I wish to
remove from the tree?
***
root.xpath(f'.//{tag}')
for item in items:
if item.text == text:
parent = item.getparent()
parent.remove(item)
return root
# --
Also look at element.iter, element.iterdescendants, etc.
Dave
On Fri 30 Jul 2021 11:02:59 AM PDT, Gil
Hello,
I'm only getting started with (l)xml, and was curious to know if someone
had an example of how to remove all items in an XML file that match the
following string:
^\t*100.25\r\n
Thank you.
___
lxml - The Python XML Toolkit mailing list -- l
37 matches
Mail list logo