ound. Thanks to Stefan for
helping me sort it. I think your code made be too aggressive. It might
help to look at the Openpyxl worksheet parser which has to handle what
happens if you do additional processing within nodes.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Res
tinuously
> grow as it parses?
>
> Have you run a memory profiler on your code? Or a (statistical) line profiler
> to see where the time is spent
Excellent suggestions: memory_profiler and pympler are useful tools for this.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & R
try:
while True:
el = (yield)
if el is True:
yield xf
xf.write(el)
except GeneratorExit:
pass
def writer(out_stream, in_stream):
with xmlfile(out_stream) as xf:
for el in in_stre
htly related note, is there anyway getting the parser to treat some
attributes as numbers to avoid casting in Python?
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
_
On 18 Jan 2024, at 18:10, Charlie Clark wrote:
> Apart from the fact that this currently doesn't work, I imagine that both
> Elements and their children would happily be passed to the write, which could
> lead to an almighty mess. Getting this to work properly, possibly rewritte
or you choose to do all the state keeping yourself and take
the bare parse events, and then have full control over the amount of
state that you keep. Whatever is better for your use case.
I have a feeling that it's this "state" thing that I don't
. Has anyone had any experience
with this?
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
___
lxml - The Python XML Toolkit mai
ffecting popular libraries isn't unusual.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
___
lxml - The Python XML T
ee https://lxml.de/parsing.html#iterparse-and-iterwalk
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
___
lxml - The Python XML Toolki
xml
> wheels are built using
>
> cython, etc...
Check out https://github.com/lxml/lxml-wheels for configuration options but I
don't think there is anything special. And you can always check the versions of
libxml2, etc.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting &am
any elements are there in your tree? Memory use in XML can get very
expensive so combining iterparse with xmlfile would be an alternative. Also, if
you're only interested in duplicate names, use a set rather than a dictionary.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Resea
enerator
cursor.execute('BEGIN')
cursor.executemany('INSERT INTO wp (name,latitude,longitude)
VALUES(?,?,?)', rows)
cursor.execute("COMMIT")
cursor.execute("SELECT name, latitude, longitude from wp group by
latitude, longitude")
cursor.fetchall()
nction will be faster for this kind of one off.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
___
lxml - The Python XML Toolkit ma
writing to a Sqlite database and
exporting unique values back to XML is probably going to be easier.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
8.xml",
> line 3663
>
> lxml.etree.XMLSyntaxError: ID N01868-0011-3105 already defined, line 3663,
> column 74
Does your parses work with Jens' example? If so I'd suggest you post a small
sample from one of your files.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Resea
*a lot** like this
https://mail.python.org/archives/list/lxml@python.org/thread/LCTOSIIWGGALAMSZAYHRRYUWYDRESCUO/
Can you update your version of libxml2?
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-
On 1 Jun 2022, at 10:32, Charlie Clark wrote:
> I'll keep an eye on it and let you know. I guess I could try replicating the
> env locally but my Docker foo is somewhat limited and, assuming, openpyxl
> isn't the only project with the dependency, hopefully someone else wil
e env locally but my Docker foo is somewhat limited and, assuming,
openpyxl isn't the only project with the dependency, hopefully someone
else will be prepared to _make it so_! ;-)
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldo
hile trying to install package.
╰─> lxml
```
I'm wondering if there is anything that can be done about this?
Presumably inform the maintainer?
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel:
On 12 May 2022, at 11:26, Gilles wrote:
> → tree = et.parse(StringIO(content), parser)
Why StringIO? XML should always be bytes but there also shouldn't be a need to
convert what you've read from the file.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Resea
use heuristics to guess which form of HTML you've passed it and you
should use the configuration options to reduce the errors it reports. But it
looks like you've passed it a PHP source file, which isn't PHP, so it's not
surprising it isn't entirely happy.
Please prov
On 11 May 2022, at 11:53, Gilles wrote:
Adrian: Thanks for the code. The output is now correct. Am I using
lxml incorrectly, or is it some issue with its HTML parser? Can I do
without using an extra package (Path.pathlib)?
Charlie Clark : The output from "et.tostring()" has "
\r\n (carriage return, line feed)? If the latter then you
probably need to load the file in binary mode.
It would be easier to help if you could provide a small sample file.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 4048
ore you've
processed them. This is especially important where you have recursive functions
like yours does.
But you should also provide more information about memory use: how much memory
does your system have and how much memory is the Python process using when it
crashes.
Charlie
l.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
___
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an e
On 23 Feb 2022, at 10:17, Charlie Clark wrote:
> Updated my local ports repo and things are looking better! There's a note
> about keeping the Python bindings in sync so I'll check that and submit a PR.
See
https://github.com/macports/macports-ports/pull/14096
Charlie
On 23 Feb 2022, at 9:06, Charlie Clark wrote:
Grumble, grumble about note missing on the homepage and FTP server.
Updated my local ports repo and things are looking better! There's a
note about keeping the Python bindings in sync so I'll check that and
submit a PR
On 23 Feb 2022, at 8:59, Charlie Clark wrote:
> I've started preparing for a PR for this but I'm stumped because 2.9.13
> doesn't appear to be on the FTP-server! Stefan, am I looking in the right
> place? Or has something gone wrong with their release management?
On 22 Feb 2022, at 17:51, Charlie Clark wrote:
> However, it sounds very much like a know issue that will hopefully disappear
> once 2.9.13 is released. MacPorts is normally pretty up to date, but I see
> that this hasn't been updated for nine months but 2.9.13 was only released
fully behind
MacPorts and spend less time fiddling with all the posix stuff! Well, one is
allowed to wish!
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
_
how to!
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Waldlehne 23
Düsseldorf
D- 40489
Mobile: +49-178-782-6226
___
lxml - The Python XML Toolkit mailing list -- lxml@python.org
To unsubscribe send an email to l
anyway.
However, it sounds very much like a know issue that will hopefully disappear
once 2.9.13 is released. MacPorts is normally pretty up to date, but I see that
this hasn't been updated for nine months but 2.9.13 was only released on the
19th of February.
Charlie
--
Charlie Clark
Managing Dir
on versions. Easy enough
to check against those, though. Jens, did you see the same behaviour with
different versions of Python?
Stefan, do you suspect anything in particular that could be responsible for the
repetition?
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Researc
us know. At least now it's knowably reproducible then that should make tracking
the problem down easier.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
rial=0)
lxml.etree : (4, 8, 0, 0)
libxml used : (2, 9, 12)
libxml compiled : (2, 9, 12)
libxslt used: (1, 1, 34)
libxslt compiled: (1, 1, 34)
3
b'\nbaz\n
\n '
b'\nbaz\n
\n '
b'\nbaz\n
\n'
All libraries have t
35 matches
Mail list logo