[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Charlie Clark
On 10 May 2022, at 13:47, [email protected] wrote: Hello, This is a newbie question. While editing HTML files on Windows, ie. line ends with 0D0A, lxml adds before each end of line: I'm not quite sure what you mean. The lines end with the string "0D0A"? Or with the \r\n (carriage retur

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Pedro Andres Aranda Gutierrez
Hi, Charlie I think he refers to *both cases* the bytes 0x0d0x0a or b'\r\n' produce the entity when pretty_printing... Rgds, /PA On Wed, 11 May 2022 at 10:22, Charlie Clark < [email protected]> wrote: > On 10 May 2022, at 13:47, [email protected] wrote: > > Hello, > > This

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Gilles
Adrian: Thanks for the code. The output is now correct. Am I using lxml incorrectly, or is it some issue with its HTML parser? Can I do without using an extra package (Path.pathlib)? Charlie Clark : The output from "et.tostring()" has " " added before each carriage return (which is 0D0A since

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Charlie Clark
On 11 May 2022, at 11:53, Gilles wrote: Adrian: Thanks for the code. The output is now correct. Am I using lxml incorrectly, or is it some issue with its HTML parser? Can I do without using an extra package (Path.pathlib)? Charlie Clark : The output from "et.tostring()" has "" added before e

[lxml] Adding block of HTML?

2022-05-11 Thread Gilles
Hello, I need to add ~twenty lines of HTML right after the tag. Does lxml provide a way to read that data from a variable, to keep things simple? for body in root.xpath('//body[@*]'):     et.SubElement(body,"",HTML_block) Thank you. __

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Gilles
On 11/05/2022 12:19, Charlie Clark wrote: It could always be a bug, but really we need a sample file to test. Which version of lxml and Python are you using? Here it is: https://we.tl/t-WowFCDBp5A Python 3.8.8, lxml 4.6.3.0 But, if all you want is pretty printing then I recommend simply usin

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Charlie Clark
On 11 May 2022, at 12:52, Gilles wrote: > I tried it before asking, but tidy fails with a few errors I don't > understand. Here's the output from a full file (not the sample I uploaded): > > line 9 column 1 - Error: unexpected in > > line 68 column 87 - Error: unexpected in > > line 73 column 1

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Paul Higgs
Without looking at the tidtly source, I would expect that it is looking for closing tags, I. E. From: Gilles Sent: Wednesday, May 11, 2022 11:52:31 AM Cc: [email protected] Subject: [lxml] Re: [newbie] lxml adds before each end of line On 11/05/2022 12:19, Ch

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Gilles
On 11/05/2022 13:57, Paul Higgs wrote: Without looking at the tidtly source, I would expect that it is looking for closing tags, I. E. Thanks for the tip. Tidy still reports an error with this: ==     == Using "-ashtml" solved the issue. Thanks!

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Gilles
Tuens out there's no need to use the pathlib module: The issue with " " is gone when 1) first reading HTML into a variable 2) before parsing it, even with the standard open(): """ OK from pathlib import Path with Path(f).open() as tempfile:     tree = et.parse(tempfile, parser=pars

[lxml] Re: [newbie] lxml adds before each end of line

2022-05-11 Thread Adrian Bool
Hi Gilles > On 11 May 2022, at 14:03, Gilles wrote: > Tuens out there's no need to use the pathlib module: The issue with " " > is gone when 1) first reading HTML into a variable 2) before parsing it, even > with the standard open(): Sure, I just tend to use pathlib for all my file handling a