Hello,
I'm still learning about lxml, and was wondering if there's a way to get the
tree from the root to avoid writing the file to disk before re-reading it just
for that:
INPUTFILE = "input.kml"
#get rid of NS
with open(INPUTFILE) as reader:
content = reader.read()
conte
Hello,
While still learning about lxml and xpath, I'm not clear as to why there are
different ways to find elements in a tree:
=
name = root.xpath('//name')
print("xpath/name is ",name[0].text)
name=root.findall('.//name')
print("findall/name is ",name[0].text)
for name in root.ite
Thank you.
I strip the namespace because 1) I'm not clear about what namespaces are for,
2) they make it harder to search, and 3) I'm dealing with very simple XML files
so it doesn't look like it makes a difference if I strip the ns from the source
file.
Here's some code I found to strip namespaces after parsing, without relying on
a regex:
```
# Remove namespace prefixes
#Source: https://stackoverflow.com/questions/60486563/
tree = et.parse(INPUTFILE)
root = tree.getroot()
for elem in root.getiterator():
#ValueError: Invalid input tag of
Thanks very much!
___
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arch...@mail-archive.com
Hello,
This is a newbie question.
While editing HTML files on Windows, ie. line ends with 0D0A, lxml adds
before each end of line:
#tried different things, to no avail
parser = et.HTMLParser(remove_blank_text=True,strip_cdata=False)
parser = et.HTMLParser(remove_blank_tex
As a work-around, I can always remove the offending substring, but it's a
kludge.
output = str(et.tostring(root, pretty_print=True)).replace('
', '')
print(output)
___
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an
For others' benefit:
===
description = tree.xpath("//meta[@name='description']/@content")
print(description[0])
===
___
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://ma
Hello,
I'd like to find and replace an element in an HTML file. I can't figure out why
getparent() doesn't work as expected:
==
import lxml.html
from lxml.html import builder as E
import lxml.etree as et
import lxml.etree as et
parser = et.HTMLParser(remove_blank_text=True,recover=T
Here's template.tmpl:
===
…
===
___
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arc
Through trial and error, it looks like xpath() returns an array, even if only
one element is found in the tree.
This works:
HERE = template_tree.xpath('//here')
if len(HERE):
print("HERE:",HERE)
parent = HERE[0].getparent()
html_tree = lxml.html.fragment_fromstring("blah",
Hello,
Some columns in a DB have badly formed HTML, to the point BeautifulSoup (lxml?)
fails:
=
#Some records start with 0A
soup = BeautifulSoup("\n", 'lxml')
#AttributeError: 'NoneType' object has no attribute 'text'
print(soup.body.text)
=
What would be a nice way to s
As a work-around, if there's only a handful of wrong records, catching the
error and fixing the records in the DB does the job:
===
try:
#file.write(soup.body.text)
text = soup.body.text
except AttributeError as error:
file.write(str(error))
__
Hello,
I can't find how to tell lxml/BS to preserve carriage returns in an HTML
snippet when calling soup.body.text: After removing 's, it also removes
the CRLF that follows.
==
builder = LXMLTreeBuilderForXML(preserve_whitespace_tags=["body"])
rows = cur.fetchall()
for row in rows:
My mistake, I'm sorry. All the carriage returns were stripped in the input
file. BS/lxml weren't to blame. Problem solved.
___
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mai
Hello,
I'm no lxml expert, so it could be a newbie error…but the following web
scrawler script sometimes breaks (see "BUG") while trying to find the number of
provinces/properties, even after two one-second sleeps:
==
import requests
from lxml import html
import re
import math
import ti
Thanks for the idea. I'll add some code to check the input.
___
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to lxml-le...@python.org
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: arc
17 matches
Mail list logo