from:"codecomplete"

[lxml] [newbie] Way to get tree from root?

2021-09-02 Thread codecomplete

Hello, I'm still learning about lxml, and was wondering if there's a way to get the tree from the root to avoid writing the file to disk before re-reading it just for that: INPUTFILE = "input.kml" #get rid of NS with open(INPUTFILE) as reader: content = reader.read() conte

[lxml] [newbie] Different ways to find elements

2021-09-03 Thread codecomplete

Hello, While still learning about lxml and xpath, I'm not clear as to why there are different ways to find elements in a tree: = name = root.xpath('//name') print("xpath/name is ",name[0].text) name=root.findall('.//name') print("findall/name is ",name[0].text) for name in root.ite

[lxml] Re: [newbie] Way to get tree from root?

2021-09-03 Thread codecomplete

Thank you. I strip the namespace because 1) I'm not clear about what namespaces are for, 2) they make it harder to search, and 3) I'm dealing with very simple XML files so it doesn't look like it makes a difference if I strip the ns from the source file.

[lxml] Re: [newbie] Way to get tree from root?

2021-09-04 Thread codecomplete

Here's some code I found to strip namespaces after parsing, without relying on a regex: ``` # Remove namespace prefixes #Source: https://stackoverflow.com/questions/60486563/ tree = et.parse(INPUTFILE) root = tree.getroot() for elem in root.getiterator(): #ValueError: Invalid input tag of

[lxml] Re: [newbie] Different ways to find elements

2021-09-04 Thread codecomplete

Thanks very much! ___ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arch...@mail-archive.com

[lxml] [newbie] lxml adds before each end of line

2022-05-10 Thread codecomplete

Hello, This is a newbie question. While editing HTML files on Windows, ie. line ends with 0D0A, lxml adds before each end of line: #tried different things, to no avail parser = et.HTMLParser(remove_blank_text=True,strip_cdata=False) parser = et.HTMLParser(remove_blank_tex

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-10 Thread codecomplete

As a work-around, I can always remove the offending substring, but it's a kludge. output = str(et.tostring(root, pretty_print=True)).replace(' ', '') print(output) ___ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an

[lxml] Re: [HTML] How to get text of attribute?

2022-05-27 Thread codecomplete

For others' benefit: === description = tree.xpath("//meta[@name='description']/@content") print(description[0]) === ___ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://ma

[lxml] getparent() fails with "AttributeError: 'list' object has no attribute 'getparent' "

2022-05-29 Thread codecomplete

Hello, I'd like to find and replace an element in an HTML file. I can't figure out why getparent() doesn't work as expected: == import lxml.html from lxml.html import builder as E import lxml.etree as et import lxml.etree as et parser = et.HTMLParser(remove_blank_text=True,recover=T

[lxml] Re: getparent() fails with "AttributeError: 'list' object has no attribute 'getparent' "

2022-05-29 Thread codecomplete

Here's template.tmpl: === … === ___ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arc

[lxml] Re: getparent() fails with "AttributeError: 'list' object has no attribute 'getparent' "

2022-05-29 Thread codecomplete

Through trial and error, it looks like xpath() returns an array, even if only one element is found in the tree. This works: HERE = template_tree.xpath('//here') if len(HERE): print("HERE:",HERE) parent = HERE[0].getparent() html_tree = lxml.html.fragment_fromstring("blah",

[lxml] Good way to remove/catch wrong tags?

2023-01-28 Thread codecomplete

Hello, Some columns in a DB have badly formed HTML, to the point BeautifulSoup (lxml?) fails: = #Some records start with 0A soup = BeautifulSoup("\n", 'lxml') #AttributeError: 'NoneType' object has no attribute 'text' print(soup.body.text) = What would be a nice way to s

[lxml] Re: Good way to remove/catch wrong tags?

2023-01-28 Thread codecomplete

As a work-around, if there's only a handful of wrong records, catching the error and fixing the records in the DB does the job: === try: #file.write(soup.body.text) text = soup.body.text except AttributeError as error: file.write(str(error)) __

[lxml] [newbie] Preserving carriage returns when calling soup.body.text?

2023-02-10 Thread codecomplete

Hello, I can't find how to tell lxml/BS to preserve carriage returns in an HTML snippet when calling soup.body.text: After removing 's, it also removes the CRLF that follows. == builder = LXMLTreeBuilderForXML(preserve_whitespace_tags=["body"]) rows = cur.fetchall() for row in rows:

[lxml] Re: [newbie] Preserving carriage returns when calling soup.body.text?

2023-02-10 Thread codecomplete

My mistake, I'm sorry. All the carriage returns were stripped in the input file. BS/lxml weren't to blame. Problem solved. ___ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mai

[lxml] Time-out? Wrong code?

2023-07-09 Thread codecomplete

Hello, I'm no lxml expert, so it could be a newbie error…but the following web scrawler script sometimes breaks (see "BUG") while trying to find the number of provinces/properties, even after two one-second sleeps: == import requests from lxml import html import re import math import ti

[lxml] Re: Time-out? Wrong code?

2023-07-09 Thread codecomplete

Thanks for the idea. I'll add some code to check the input. ___ lxml - The Python XML Toolkit mailing list -- lxml@python.org To unsubscribe send an email to lxml-le...@python.org https://mail.python.org/mailman3/lists/lxml.python.org/ Member address: arc

[lxml] [newbie] Way to get tree from root?

[lxml] [newbie] Different ways to find elements

[lxml] Re: [newbie] Way to get tree from root?

[lxml] Re: [newbie] Way to get tree from root?

[lxml] Re: [newbie] Different ways to find elements

[lxml] [newbie] lxml adds before each end of line

[lxml] Re: [newbie] lxml adds before each end of line

[lxml] Re: [HTML] How to get text of attribute?

[lxml] getparent() fails with "AttributeError: 'list' object has no attribute 'getparent' "

[lxml] Re: getparent() fails with "AttributeError: 'list' object has no attribute 'getparent' "

[lxml] Re: getparent() fails with "AttributeError: 'list' object has no attribute 'getparent' "

[lxml] Good way to remove/catch wrong tags?

[lxml] Re: Good way to remove/catch wrong tags?

[lxml] [newbie] Preserving carriage returns when calling soup.body.text?

[lxml] Re: [newbie] Preserving carriage returns when calling soup.body.text?

[lxml] Time-out? Wrong code?

[lxml] Re: Time-out? Wrong code?

17 matches

Site Navigation

Mail list logo

Footer information