[xml] This mailing list will be retired by the end of Oct 2022

2022-10-21 Thread Nick Wellnhofer via xml
According to [1], GNOME's Mailman platform is being decommissioned which 
probably means that this mailing list will go away soon.


Nick

[1] https://mail.gnome.org/archives/foundation-list/2022-October/msg2.html
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.10.3

2022-10-14 Thread Nick Wellnhofer via xml

Version 2.10.3 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.10/

### Security

- [CVE-2022-40304] Fix dict corruption caused by entity reference cycles
- [CVE-2022-40303] Fix integer overflows with XML_PARSE_HUGE
- Fix overflow check in SAX2.c

### Portability

- win32: Fix build with VS2013

### Build system

- cmake: Set SOVERSION

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.10.2

2022-08-29 Thread Nick Wellnhofer via xml

Version 2.10.2 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.10/

This should really fix the build with Python 3.10.

### Improvements

- Remove set-but-unused variable in xmlXPathScanName
- Silence -Warray-bounds warning

### Build system

- build: require automake-1.16.3 or later (Xi Ruoyao)
- Remove generated files from distribution

### Test suite

- Don't create missing.xml when running testapi

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.10.1

2022-08-25 Thread Nick Wellnhofer via xml

Version 2.10.1 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.10/

This fixes some showstoppers reported by early adopters of 2.10.0. Thsnk for 
the reports!


### Regressions

- Fix xmlCtxtReadDoc with encoding

### Bug fixes

- Fix HTML parser with threads and --without-legacy

### Build system

- Fix build with Python 3.10
- cmake: Disable version script on macOS
- Remove Makefile rule to build testapi.c

### Documentation

- Switch back to HTML output for API documentation
- Port doc/examples/index.py to Python 3
- Fix order of exports in libxml2-api.xml
- Remove libxml2-refs.xml

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.10.0

2022-08-17 Thread Nick Wellnhofer via xml

Version 2.10.0 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.10/

In this release, I started to remove a lot of old cruft like build systems for 
outdated platforms which haven't been touched in 10+ years.


The Docbook parser module has been removed completely. As far as I can tell, 
this was experimental code which never really worked and generated a 
deprecation warning for 15+ years.


Some other modules are now disabled by default and will eventually be removed 
completely:


- Support for XPointer locations (ranges and points): This was based on
  a W3C specification which never got beyond Working Draft status. To my
  knowledge, there's no software supporting this spec which is still
  maintained. You now have to enable this code by passing the
  `--with-xptr-locs` configuration option. Be warned that this part of
  the code base is buggy and had many security issues in the past.

- Support for the built-in FTP client (`--with-ftp`).

- Support for "legacy" functions (`--with-legacy`).

I also started to deprecate several functions of the public API. Most of them 
should be completely unused and will generate a deprecation warning now.


Special thanks to David Seifert and Daniel Engberg who contributed many 
improvements to the build system, and to David Kilzer for many patches that 
harden security.


It's likely that this release will break a few things. If you're concerned 
about stability, I'd suggest to wait for 2.10.1 which I plan to release in 6-8 
weeks. Going forward, patch releases will only contain important bug fixes. My 
plan is to bump the minor version about every six months and possibly make bug 
fix releases for older branches as well.


Here's the full changelog:

### Security

- [CVE-2022-2309] Reset nsNr in xmlCtxtReset
- Reserve byte for NUL terminator and report errors consistently in xmlBuf and
  xmlBuffer (David Kilzer)
- Fix missing NUL terminators in xmlBuf and xmlBuffer functions (David Kilzer)
- Fix integer overflow in xmlBufferDump() (David Kilzer)
- xmlBufAvail() should return length without including a byte for NUL
  terminator (David Kilzer)
- Fix ownership of xmlNodePtr & xmlAttrPtr fields in xmlSetTreeDoc() (David
  Kilzer)
- Use xmlNewDocText in xmlXIncludeCopyRange
- Fix use-after-free bugs when calling xmlTextReaderClose() before
  xmlFreeTextReader() on post-validating parser (David Kilzer)
- Use UPDATE_COMPAT() consistently in buf.c (David Kilzer)
- fix: xmlXPathParserContext could be double-delete in  OOM case. (jinsub ahn)

### Removals and deprecations

- Disable XPointer location support by default
- Remove outdated xml2Conf.sh
- Deprecate module init and cleanup functions
- Remove obsolete XML Software Autoupdate (XSA) file
- Remove DOCBparser
- Remove obsolete Python test framework
- Remove broken VxWorks support
- Remove broken Mac OS 9 support
- Remove broken bakefile support
- Remove broken Visual Studio 2010 support
- Remove broken Windows CE support
- Deprecate IDREF-related functions in valid.h
- Deprecate legacy functions
- Disable legacy support by default
- Deprecate all functions in nanoftp.h
- Disable FTP support by default
- Add XML_DEPRECATED macro
- Remove elfgcchack.h

### Regressions

- Skip incorrectly opened HTML comments
- Restore behavior of htmlDocContentDumpFormatOutput() (David Kilzer)

### Bug fixes

- Fix memory leak with invalid XSD
- Make XPath depth check work with recursive invocations
- Fix memory leak in xmlLoadEntityContent error path
- Avoid double-free if malloc fails in inputPush
- Properly fold whitespace around the QName value when validating an XSD
  schema. (Damjan Jovanovic)
- Add whitespace folding for some atomic data types that it's missing on.
  (Damjan Jovanovic)
- Don't add IDs containing unexpanded entity references

### Improvements

- Avoid calling xmlSetTreeDoc
- Simplify xmlFreeNode
- Don't reset nsDef when changing node content
- Fix unintended fall-through in xmlNodeAddContentLen
- Remove unused xmlBuf functions (David Kilzer)
- Implement xpath1() XPointer scheme
- Add configuration flag for XPointer locations support
- Fix compiler warnings in Python code
- Mark more static data as `const` (David Kilzer)
- Make xmlStaticCopyNode non-recursive
- Clean up encoding switching code
- Simplify recursive pthread mutex
- Use non-recursive mutex in dict.c
- Fix parser progress checks
- Avoid arithmetic on freed pointers
- Improve buffer allocation scheme
- Remove unneeded #includes
- Add support for some non-standard escapes in regular expressions. (Damjan
  Jovanovic)
- htmlParseComment: handle abruptly-closed comments (Mike Dalessio)
- Add let variable tag support (Oliver Diehl)
- Add value-of tag support (Oliver Diehl)
- Remove useless call to xmlRelaxNGCleanupTypes
- Don't include ICU headers in public headers
- Update `xmlStrlen()` to use POSIX / ISO C `strlen()` (Mike Dalessio)
- Fix unused variable warnings with disabled features
- Only warn on invalid redeclarations of 

Re: [xml] How can I parse an XML file whose filesystem path is a Unicode string?

2022-08-02 Thread Nick Wellnhofer via xml

On 31/07/2022 17:40, Paul Kinnucan via xml wrote:
My Xerces-c implementation uses a custom entity resolver to 
resolve file entities. I might need a custom entity resolver to fix the 
problem with the libxml2 implementation. However, libxml2 does not seem to 
support custom entity resolvers. At lease, I have not been able to find this 
feature in the doc or the libxml2 code base on GitHub.


You can install a custom entity loader with xmlSetExternalEntityLoader:

https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-parser.html#xmlSetExternalEntityLoader

Another option is to use "input callbacks":

https://gnome.pages.gitlab.gnome.org/libxml2/devhelp/libxml2-xmlIO.html#xmlRegisterInputCallbacks

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43743

2022-06-27 Thread Nick Wellnhofer via xml

On 24/06/2022 21:48, enh via xml wrote:
did anyone report https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=43743 
 to libxml2 directly?


No, this wasn't reported. For now, these issues should be reported to the 
libxml2 bug tracker. That said, I will resubscribe to OSS-Fuzz soon and handle 
new issues directly.


sadly, it looks like there are actually a bunch of fuzzer-found bugs that may 
never have been reported upstream? (i haven't checked; i'm just guessing.) see 
https://bugs.chromium.org/p/oss-fuzz/issues/list?q=libxml2=2 
 for example.


Most of the timeout and OOM issues are hard to fix. I'll try to address some 
of them in the next months.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.9.14

2022-05-02 Thread Nick Wellnhofer via xml

Version 2.9.14 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.9/

Note that starting with 2.9.13, libxml2 tarballs are published on 
download.gnome.org instead of ftp.xmlsoft.org.


### Security

- [CVE-2022-29824] Integer overflow in xmlBuf and xmlBuffer
- Fix potential double-free in xmlXPtrStringRangeFunction
- Fix memory leak in xmlFindCharEncodingHandler
- Normalize XPath strings in-place
- Prevent integer-overflow in htmlSkipBlankChars() and xmlSkipBlankChars()
  (David Kilzer)
- Fix leak of xmlElementContent (David Kilzer)

### Bug fixes

- Fix parsing of subtracted regex character classes
- Fix recursion check in xinclude.c
- Reset last error in xmlCleanupGlobals
- Fix certain combinations of regex range quantifiers
- Fix range quantifier on subregex

### Improvements

- Fix recovery from invalid HTML start tags

### Build system, portability

- Define LFS macros before including system headers
- Initialize XPath floating-point globals
- configure: check for icu DEFS (James Hilliard)
- configure.ac: produce tar.xz only (GNOME policy) (David Seifert)
- CMakeLists.txt: Fix LIBXML_VERSION_NUMBER
- Fix build with older Python versions
- Fix --without-valid build

Thanks to all contributors!

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Is anyone still using XPointer ranges?

2022-04-07 Thread Nick Wellnhofer via xml
I'm curious if there are people out there who still use XPointer ranges, 
specifically things like the range-to XPath extension function. This part of 
the code base is extremely buggy and the latest spec seems to be a Working 
Draft from 2002 which was never finished [1]. The xpointer() scheme is listed 
as "being reviewed" in the XPointer registry since 2006 [2]. I couldn't find 
any other projects that are still maintained and implement this feature. Here 
are some that don't:


- Xerces: "The XPointer xpointer() Scheme is currently not supported." [3]
- Mvp.Xml: " XPointer xpointer() Scheme (XPath subset only)" [4]

Since I have no plans to work on this part of the code base, I'm thinking 
about phasing out support for this feature. xpointer() expressions will 
continue to work but without any XPath extensions for locations, ranges and 
points. Just like the xpath1() scheme which we should start to support as well.


Nick


[1] https://www.w3.org/TR/xptr-xpointer/
[2] https://www.w3.org/2005/04/xpointer-schemes/
[3] https://xerces.apache.org/xerces2-j/faq-xinclude.html
[4] http://mvp-xml.sourceforge.net/xinclude/


___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Euro sign in xml:id

2022-04-06 Thread Nick Wellnhofer via xml

On 06/04/2022 00:40, Einhard Leichtfuß wrote:

I noticed that xmllint complains about the Euro sign ("€") in an xml:id.
  - "validity error : xml:id : attribute value € is not an NCName"

The W3C's XML specification, however, seems to allow this:
  - https://www.w3.org/TR/xml-id/#processing
  - https://www.w3.org/TR/xmlschema-2/#ID
  - https://www.w3.org/TR/xml-names/#NT-NCName
  - https://www.w3.org/TR/xml/#NT-NameStartChar
  * '€' is #x20ac which is in the range [#x2070-#x218F], a subset of
NameStartChar, and may, therefore, occur anywhere in an NCName.

Am I mistaken above, should I look at another specification, or is this
a bug?


This is a bug. The xmlValidate*Name functions in tree.c weren't updated to XML 
1.0, Fifth Edition which includes the following change:


https://www.w3.org/XML/xml-V10-4e-errata#E09

This issue is now tracked here:

https://gitlab.gnome.org/GNOME/libxml2/-/issues/364

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Support libxml2 and libxslt on Open Collective

2022-02-27 Thread Nick Wellnhofer via xml

On 23/02/2022 23:39, Eberhard wrote:

Dumb question.  How do I contribute in dollars?  I get Euros and no option
to change.  E


Everything should be set to USD now.

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Release of libxml2 2.9.13

2022-02-23 Thread Nick Wellnhofer via xml

On 23/02/2022 08:17, Stefan Behnel wrote:
Could you make the archives available in a (second) format that matches all 
(previous) releases?


The archives are automatically converted to .tar.xz when uploaded to the GNOME 
download server. I have no influence on that. Personally, I'd prefer .tar.gz 
for compatibility reasons, but I don't have a strong opinion.


I asked on GNOME infra if it is possible to offer .tar.gz downloads, but this 
would require changes to the upload script.


https://gitlab.gnome.org/Infrastructure/Infrastructure/-/issues/768

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Release of libxml2 2.9.13

2022-02-21 Thread Nick Wellnhofer via xml

On 21/02/2022 14:57, Mike Dalessio wrote:
I'm not asking specifically for a CVSS score for this vulnerability, and I'm 
certainly not asking you to create a CVE for every memory fix that's found. 
I'm only asking for a more accessible explanation of the conditions under 
which an application might be vulnerable to this already-published CVE.


From my limited analysis, there are two scenarios:

1. When using the reader API (xmlreader.h, xmlTextReader)

  Conditions:

  - Create a reader with parser option XML_PARSE_DTDVALID (or "parser
property" XML_PARSER_VALIDATE) but without parser option XML_PARSE_NOENT
(XML_PARSER_SUBST_ENTITIES)
  - Parse an untrusted document

  Impact:

  - Crash (DoS)
  - Memory disclosure via error channel

2. When using another parser API

  Conditions:

  - Parse an untrusted document with XML_PARSE_DTDVALID but without
XML_PARSE_NOENT
  - Delete a portion of the resulting document
  - Call xmlGetID on the document

  Potential impact:

  - Crash (DoS)
  - Arbitrary memory disclosure
  - Arbitrary code execution

Would this be an appropriate explanation for me to include in my security 
advisory?


 > An application may be vulnerable to a denial-of-service attack if it parses 
an untrusted document with parse options `DTDVALID` on, and `NOENT` off.


No, that's understating the severity. As I tried to explain, it's impossible 
to assess the severity without auditing each and every downstream project. 
Since clever exploitation of use-after-free errors can result in code 
execution, I have to assume the worst case if you force me to make a general 
statement.


DISCLAIMER: I make no guarantees regarding the accuracy and completeness of my 
statements above.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Support libxml2 and libxslt on Open Collective

2022-02-21 Thread Nick Wellnhofer via xml

Hello,

You can now support libxml2 and libxslt financially on Open Collective:

https://opencollective.com/libxml2

All donations go through the Open Source Collective, a non-profit organization 
providing financial and legal infrastructure for thousands of open source 
projects.


https://www.oscollective.org/

If you prefer, you can also support me directly. Just get in touch by email.

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Release of libxml2 2.9.13

2022-02-20 Thread Nick Wellnhofer via xml

On 20/02/2022 20:50, Mike Dalessio wrote:
Is there any additional information about CVE-2022-23308 (other than the 
commit log) that would help downstream projects triage? Was there a CVSS score 
calculated or severity assigned?


In this case, the CVE record is managed by a third party. It should be made 
public soon, but I have no influence on that. In my personal opinion, the 
whole CVE system is severely flawed with regard to OSS projects. Basically, 
anyone can request a CVE ID for arbitrary projects without having to 
coordinate with maintainers.


It's often hard, if not impossible, to come up with meaningful CVSS scores for 
vulnerabilities in software libraries. If there's a flaw in a certain library 
function, it really depends on how this function used by downstream projects. 
If you look at major Linux distros, there are 500+ projects with a direct 
dependency on libxml2, and thousands with an indirect dependency. Most of them 
don't call the vulnerable functions at all, some others are libraries 
themselves, so it all depends on their users.


There are quite a few preconditions to be met to trigger a use-after-free in 
this particular case, so I'm not overly concerned. Even then, it seems 
anything but trivial come up with a serious exploit. But I'm not really an 
expert and you never can tell without auditing tens or hundreds of downstream 
projects. Besides, I only have limited resources to assess the impact of 
security issues, and it's always possible that I missed something.


Note that for some reason, GitLab truncates the commit message after ~1000 
characters with no obvious way to expand it, at least on gitlab.gnome.org. You 
can see the full commit message on the GitHub mirror:



https://github.com/GNOME/libxml2/commit/652dd12a858989b14eed4e84e453059cd3ba340e

Nick



___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Release of libxml2 2.9.13

2022-02-20 Thread Nick Wellnhofer via xml

Version 2.9.13 of libxml2 is available at:

https://download.gnome.org/sources/libxml2/2.9/

Note that starting with this release, libxml2 tarballs are published on 
download.gnome.org instead of ftp.xmlsoft.org.


### Security

- [CVE-2022-23308] Use-after-free of ID and IDREF attributes
  (Thanks to Shinji Sato for the report)
- Use-after-free in xmlXIncludeCopyRange (David Kilzer)
- Fix null deref in xmlSchemaGetComponentTargetNs (huangduirong)
- Fix memory leak in xmlXPathCompNodeTest
- Fix null pointer deref in xmlStringGetNodeList
- Fix several memory leaks found by Coverity (David King)

### Fixed regressions

- Fix regression in RelaxNG pattern matching
- Properly handle nested documents in xmlFreeNode
- Fix regression with PEs in external DTD
- Fix random dropping of characters on dumping ASCII encoded XML (Mohammad
  Razavi)
- Revert "Make schema validation fail with multiple top-level elements"
- Fix regression when parsing invalid HTML tags in push mode
- Fix regression parsing public IDs literals in HTML
- Fix buffering in xmlOutputBufferWrite
- Fix whitespace when serializing empty HTML documents
- Fix XPath recursion limit
- Fix regression in xmlNodeDumpOutputInternal
- Work around lxml API abuse

### Bug fixes

- Fix xmlSetTreeDoc with entity references
- Fix double counting of CRLF in comments
- Make sure to grow input buffer in xmlParseMisc
- Don't ignore xmllint options after "-"
- Don't normalize namespace URIs in XPointer xmlns() scheme
- Fix handling of XSD with empty namespace
- Also register HTML document nodes
- Make xmllint return an error if arguments are missing
- Fix handling of ctxt->base in xmlXPtrEvalXPtrPart
- Fix xmllint --maxmem
- Fix htmlReadFd, which was using a mix of xml and html context functions
  (Finn Barber)
- Move current position before possible calling of ctxt->sax->characters
  (Yulin Li)
- Fix parse failure when 4-byte character in UTF-16 BE is split across a chunk
  (David Kilzer)
- Patch to forbid epsilon-reduction of final states (Arne Becker)
- Avoid segfault at exit when using custom memory functions (Mike Dalessio)

### Tests, code quality, fuzzing

- Remove .travis.yml
- Make xmlFuzzReadString return a zero size in error case
- Fix unused function warning in testapi.c
- Update NewsML DTD in test suite
- Add more checks for malloc failures in xmllint.c
- Avoid potential integer overflow in xmlstring.c
- Run CI tests with UBSan implicit-conversion checks
- Fix casting of line numbers in SAX2.c
- Fix integer conversion warnings in hash.c
- Add explicit casts in runtest.c
- Fix integer conversion warning in xmlIconvWrapper
- Add suffix to unsigned constant in xmlmemory.c
- Add explicit casts in testchar.c
- Fix integer conversion warnings in xmlstring.c
- Add explicit cast in xmlURIUnescapeString
- Remove unused variable in xmlCharEncOutFunc (David King)

### Build system, portability

- Remove xmlwin32version.h
- Fix fuzzer test with VPATH build
- Support custom prefix when installing Python module
- Remove Makefile.win
- Remove CVS and SVN-related code
- Port python 3.x module to Windows and improve distutils (Chun-wei Fan)
- Correctly install the HTML examples into their subdirectory (Mattia Rizzolo)
- Refactor the settings of $docdir (Mattia Rizzolo)
- Remove unused configure checks (Ben Boeckel)
- python/Makefile.am: use *_LIBADD, not *_LDFLAGS for LIBS (Sam James)
- Fix check for libtool in autogen.sh
- Use version in configure.ac for CMake (Timothy Lyanguzov)
- Add CMake alias targets for embedded projects (Markus Rickert)

### Documentation

- Remove SVN keyword anchors
- Rework README
- Remove README.cvs-commits
- Remove old ChangeLog
- Update hyperlinks
- Remove README.docs
- Remove MAINTAINERS
- Remove xmltutorial.pdf
- Upload documentation to GitLab pages
- Document how to escape XML_CATALOG_FILES
- Fix libxml2.doap
- Update URL for libxml++ C++ binding (Kjell Ahlstedt)
- Generate devhelp2 index file (Emmanuele Bassi)
- Mention XML_CATALOG_FILES is space-separated (Jan Tojnar)
- Add documentaiton for xmllint exit code 10 (Rainer Canavan)
- Fix some validation errors in the FAQ (David King)
- Add instructions on how to use CMake to compile libxml (Markus Rickert)

Thanks to all contributors!

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Intent to remove build systems for outdated platforms

2022-02-16 Thread Nick Wellnhofer via xml
I plan to remove several directories from the libxml2 repo containing build 
systems for outdated platforms.

VxWorks

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/VxWorks

Bakefile

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/bakefile

MacOS 9

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/macos

VMS

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/vms

Windows CE

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/win32/wince

Visual Studio 2010

https://gitlab.gnome.org/GNOME/libxml2/-/tree/master/win32/VC10

These files haven’t been updated in 10+ years and most likely broken.

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Schema validation skipping IDC

2022-02-09 Thread Nick Wellnhofer via xml

On 09/02/2022 14:48, Stefan de Konink wrote:

On Wednesday, February 9, 2022 1:25:41 PM CET, Nick Wellnhofer wrote:
I'm always reluctant to add new features, especially if it sounds like it 
only solves a problem for a single user. Do you want to disable checking of 
identity constraints for performance reasons or is there another use case?


They are indeed based on performance reasons, where the syntax validation is 
extremely fast and powerful (even single threaded, as expected), but IDC is 
(for the size of our documents) costly.


Can provide more detail about the performance problem? Ideally by opening a 
Gitlab issue.


Like Eric pointed out; to support this use case now it requires two schema's 
one with and one without. Since our schema consists of 384 individual xsd's 
that is less trivial to search and replace on the fly.


It seems that you only have to remove certain elements from the XSDs which 
should be easy to automate.


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Schema validation skipping IDC

2022-02-09 Thread Nick Wellnhofer via xml

On 01/02/2022 13:39, Stefan de Konink wrote:

Hi,

Would a patch be accepted that would create an option to disable identity 
constraints at runtime? Use case: only syntactically validate a file.


I'm always reluctant to add new features, especially if it sounds like it only 
solves a problem for a single user. Do you want to disable checking of 
identity constraints for performance reasons or is there another use case?


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Resuming maintenance

2022-01-14 Thread Nick Wellnhofer via xml

On 12/01/2022 17:30, Stefan de Konink wrote:
If you're seeing degraded performance on large documents, it's likely 
another issue with quadratic runtime. Fixing such issues algorithmically 
should typically yield much better results than trying to work around them 
with multi-threading.


What can I do to identify these thing in a usable way? Would a profiler help 
in this case?


Yes, profiling is usually the quickest way to see which part of the code is 
causing performance issues. Then you could try to isolate the problem and come 
up with a test case where doubling the input size results in quadrupling the 
execution time or shows other superlinear behavior.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Resuming maintenance

2022-01-12 Thread Nick Wellnhofer via xml

On 11/01/2022 11:38, Daniel Veillard wrote:

  So you want to reintegrate libxml2 within the GNOME framework ? TBH
now that I have very limited bandwidth that's probably the right thing
to do.


I didn't mean the GNOME desktop environment itself, but the infrastructure 
that the GNOME Foundation offers. Mostly the GitLab instance which could be 
used to create and distribute releases and the GitLab Wiki which could be used 
for documentation.


It seems like a historical accident that libxml2 ended up under the GNOME 
umbrella, but why shouldn't we use the features we are offered? It certainly 
makes collaboration easier than maintaining your own website. It's also nice 
to have a self-hosted platform compared to something like GitHub.



  Happy to help you any steps you may need to take over,


For now, it's enough to receive some formal blessing from you to start making 
releases on my own.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Resuming maintenance

2022-01-12 Thread Nick Wellnhofer via xml

On 10/01/2022 20:47, Mike Dalessio wrote:
Although I'm relieved, the potential loss of maintainers from the project 
 was 
alarming. Perhaps another goal to consider for the year is to expand the pool 
of contributors and maintainers. I (and others, I assume) am interested in 
volunteering more time so that the burden isn't carried by you alone, and so 
that if in the future you're unable to secure funding the user community will 
be able to sustain that loss.


Thanks again, and please think about what work volunteers can pick up to get 
more involved.


Anyone is invited to help with maintenance. But I can't think of many simple 
issues for people to get started. Fixing bugs and reviewing merge requests 
often requires deep knowledge of the code base which in turn requires to 
invest considerable amounts of time. On the other hand, everyone has to start 
somewhere. The best way is probably to start working on interesting issues, 
learn from any mistakes you make, and repeat.


Personally, I think the main problem is funding. The pool of competent 
programmers willing to spend months of their time to work on a rather outdated 
code base implementing mostly legacy technology for free is tiny or even 
non-existent. It's really the large corporations who could make a difference 
by sponsoring OSS maintenance directly. I'm sure you can find people like me 
who would work on OSS at a discount, but not without any monetary compensation.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Resuming maintenance

2022-01-12 Thread Nick Wellnhofer via xml

On 10/01/2022 16:51, Stefan de Konink wrote:
This is great news, thanks Google for acknowledging the importance of 
maintaining core open source products. Your previous improvements on XSD 
validation made a great difference, but from my prototype in Python (LXML) I 
assume that multithreaded constraint validation and a more efficient way of 
storage would gain additional performance on files larger than 500MB. One may 
ask if no 'green fund' would be able to donate money on these type of 
improvements.


I didn't make any performance improvements to the XSD code personally. You're 
probably seeing improvements from the following commit which wasn't authored 
by me:


https://gitlab.gnome.org/GNOME/libxml2/-/commit/faea2fa9

If you're seeing degraded performance on large documents, it's likely another 
issue with quadratic runtime. Fixing such issues algorithmically should 
typically yield much better results than trying to work around them with 
multi-threading.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Resuming maintenance

2022-01-10 Thread Nick Wellnhofer via xml

Hello,

Thanks to a donation from Google, I'm able to resume maintenance of libxml2 
(and libxslt) for the remainder of 2022.


My immediate plans are:

- Make a bug fix release fixing many regressions.
- Establish a new release schedule, possibly with multiple branches being
  maintained.
- Move releases from the old FTP server to GNOME's Gitlab infrastructure.
- Move documentation to GNOME infrastructure.
- Set up an official way to sponsor libxml2 maintainers.

In the future I'll focus less on security improvements and more on typical 
maintenance duties like bug fixes and modernizing the code base in a few ways.


Thanks (again) to Google for making this possible.

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] userdata for SAX parsing with schema validation

2022-01-03 Thread Nick Wellnhofer via xml

On 23/12/2021 20:14, Lara Blatchford wrote:
Hi - I have a simple SAX handler set up, and schema validation errors are 
being caught by my structured error handler.  So far so good.


It appears that the userdata argument to xmlSAXUserParseMemory /must/ be the 
xmlSchemaSAXPlugPtr returned by the call to xmlSchemaSAXPlug, and that this 
pointer is passed as the ctx pointer to the SAX handler callbacks.


This is correct.

Is there any way for me to make a userdata pointer of my choosing available to 
my SAX handler callbacks while still getting schema validation?


From a quick look at the code, it seems that you can simply pass your user 
data pointer to xmlSchemaSAXPlug.



    // userdata arg is set to the pointer to the original SAX user data pointer

    xmlSchemaValidCtxtPtr oldXsdValidCtxt = NULL;;

    void *ctxptr = 

    xmlSchemaSAXPlugPtr saxPlug = xmlSchemaSAXPlug(xsdValidCtxt, 
,  );


You should pass your user data pointer here instead of a NULL 
xmlSchemaValidCtxtPtr:


void *user_data = my_user_data;
xmlSchemaSAXPlugPtr saxPlug = xmlSchemaSAXPlug(xsdValidCtxt,
, _data);

xmlSchemaSAXPlug will then swap the user data pointer with its own one which 
you have to use when calling xmlSAXUserParseMemory. The SAX callbacks, 
however, should receive the original pointer.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


[xml] Stepping down

2021-07-22 Thread Nick Wellnhofer via xml
I never really asked for it but in the last years I became de-facto maintainer 
of both libxml2 and libxslt. Luckily, I was able to fund my involvement 
through Chrome VRP bug bounties and OSS-Fuzz integration rewards. Big thanks 
to Google for these outstanding programs.


Unfortunately, returns from security research are diminishing quickly and I 
see no way to obtain a minimal level of funding anymore. So I'm stepping down 
as contributor and maintainer.


Thanks to everyone who reported bugs and contributed patches!

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Release of libxml2 2.9.11

2021-05-14 Thread Nick Wellnhofer via xml

On 13/05/2021 23:13, Stefan Behnel wrote:

Difficult to say if this is an improvement or deliberate breakage.
Technically, it's not a semantic change in the XML output, rather a byte
level change in ignorable whitespace. But I'll need to look into it further
to understand what the best adaptation to this change is.


This is caused by one of my changes. I can have a look and revert to the old 
behavior.



More importantly, there also seem to be issues where additional closing
tags or duplicated PIs and comments are being written, e.g.


This is tracked here:

https://gitlab.gnome.org/GNOME/libxml2/-/issues/255

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] warning: cast from 'unsigned char *' to 'unsigned short *'

2021-03-23 Thread Nick Wellnhofer via xml

On 23/03/2021 00:38, Jeffrey Walton via xml wrote:

encoding.c:500:26: warning: cast from 'const unsigned char *' to
   'unsigned short *' increases required alignment from 1 to 2 
[-Wcast-align]
 unsigned short* in = (unsigned short*) inb;



If the buffers are aligned, then you can use the following to squash
the warning:


This is a known issue. Internal use of these functions should be safe but the 
encoding API is also exposed publically and could be used with unaligned 
pointers. So the warning is valid.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] libxml2 2.9.10 and Hang after Testing parser : 61 of 70 functions

2021-03-22 Thread Nick Wellnhofer via xml

On 22/03/2021 05:21, Jeffrey Walton via xml wrote:

I'm working on my old PowerMac G5, powerpc-apple-darwin9.8.0. I'm
trying to build an updated OpenSSH. libxml2 2.9.10 is a distant
dependency.


First of all, it's great to hear that libxml2 compiled at all and that most of 
the tests seem to pass.



libxml2's make check is hanging at:

 ...
 Testing nanoftp : 14 of 22 functions ...
 Testing nanohttp : 13 of 17 functions ...
 Testing parser : 61 of 70 functions ...
 

Does anyone have an idea what may be going sideways?


That's the 'testapi' test which causes the same problem on Windows. The test 
should complete eventually. It's just incredibly slow. One possible 
explanation is that somewhere an array is reallocated every time an element is 
appended. Some Linux allocators can handle repeated reallocations in linear 
time, but in general, you have to expect quadratic behavior. I just haven't 
found the time to investigate the issue.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] fix memory leak when xmlRegStatePush failed

2021-03-13 Thread Nick Wellnhofer via xml

On 12/01/2021 10:42, zhuyan (M) wrote:


In the function xmlRegStatePush, if xmlMalloc or xmlRealloc fails,


Yes, there are many issues that arise from poor handling of malloc failures. 
Fortunately, similar issues can be found quite effectively by changing the 
fuzzers to inject malloc failures. I already started to address these errors 
in a more systematic way, but I want to hold off further commits until after 
the next release.


Note that in this particular case, it is easier to make static function 
xmlRegStatePush free the 'to' state on error.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] xmlGetNodePath() returns invalid path for XML_DTD_NODE

2021-03-13 Thread Nick Wellnhofer via xml

On 08/02/2021 18:01, Christoph M. Becker wrote:

On 08.02.2021 at 17:23, Nick Wellnhofer wrote:

This should be fixed for other node types as well. Does the attached
patch work for you?


Yes, that works fine.  Thank you!


This is fixed in master now:

https://gitlab.gnome.org/GNOME/libxml2/-/commit/e20c9c148c725e2933efa143ee6a543a5cae4204

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] about xmlReadMemory()

2021-03-03 Thread Nick Wellnhofer via xml

On 03/03/2021 09:30, nicolas bats wrote:

Hi Nick,
I've experimented with xmlReadIO and it's cool.
this message just to check I'm doing right:
-I register an xmlInputReadCallback of type: size_t myCallback(void* context, 
char* buffer, int length)
-I do my stuff in the callback and if data I use exceed the length of the 
buffer, I realloc it.

Is this schema good?
Do I need to set size_t as the return type of myCallback?


No, the read callback is supposed to fill the buffer with up to 'length' 
bytes. Try something like:


typedef struct {
const char *ptr;
size_t remaining;
} myContext;

static int
myReadCallback(void *vcontext, char *buffer, int len) {
myContext *context = vcontext;

if (context->remaining < len)
len = context->remaining;
memcpy(buffer, context->ptr, len);
context->ptr += len;
context->remaining -= len;

return len;
}

xmlDocPtr
myReadMemory(const char *buffer, size_t size, const char *URL,
 const char *encoding, int options) {
myContext context;

context.ptr = buffer;
context.remaining = size;

return xmlReadIO(myReadCallback, NULL, , URL, encoding, options);
}
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] about xmlReadMemory()

2021-03-02 Thread Nick Wellnhofer via xml

On 02/03/2021 16:28, nicolas bats via xml wrote:

Hi,
is there's a reason why xmlReadMemory 
() accepts int as 
the size of the array to transform to xmlDocPtr.

no doubt there's one...


That's simply a design mistake. The API was created 20 years ago when 64-bit 
systems were rare.


and in that case how could I retrieve a xmlDocPtr from 
memory where size is type of size_t?


If you want to process memory buffers larger than INT_MAX, you can use 
xmlReadIO with a custom read callback that uses a size_t to store the offset.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] xmlGetNodePath() returns invalid path for XML_DTD_NODE

2021-02-08 Thread Nick Wellnhofer via xml

On 28/01/2021 14:51, Christoph M. Becker via xml wrote:

-if ((node == NULL) || (node->type == XML_NAMESPACE_DECL))
+if ((node == NULL) || (node->type == XML_NAMESPACE_DECL)
+|| (node->type == XML_DTD_NODE))
  return (NULL);


This should be fixed for other node types as well. Does the attached patch 
work for you?


Nick

diff --git a/tree.c b/tree.c
index d2347dfdf..636f81fed 100644
--- a/tree.c
+++ b/tree.c
@@ -4881,7 +4881,9 @@ xmlGetNodePath(const xmlNode *node)
 }
 next = ((xmlAttrPtr) cur)->parent;
 } else {
-next = cur->parent;
+xmlFree(buf);
+xmlFree(buffer);
+return (NULL);
 }
 
 /*
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Issue in building for arm...

2021-01-18 Thread Nick Wellnhofer via xml

On 18/01/2021 12:30, Abu Muttalib via xml wrote:

In file included from /usr/include/python2.7/Python.h:8:0,
                  from libxml.c:15:
/usr/include/python2.7/pyconfig.h:14:54: fatal error: 
arm-linux-gnueabihf/python2.7/pyconfig.h: No such file or directory

compilation terminated.


Simply disable the Python bindings:

./configure --without-python

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Constraint validation for huge documents

2021-01-05 Thread Nick Wellnhofer via xml
The XML Schemas code hasn't been actively maintained for more than 10 years, 
so it's unlikely to receive a helpful answer regarding the code.


There was a recent patch which might help:


https://gitlab.gnome.org/GNOME/libxml2/-/commit/faea2fa9b890cc329f33ce518dfa1648e64e14d6

Other than that, you'll have to dig through the sources yourself.

Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Fwd: Windows libxml2.lib missing?

2020-12-09 Thread Nick Wellnhofer via xml

On 09/12/2020 01:49, Pro Turm via xml wrote:
do you know why the provided Windows binaries dont contain any .lib files? No 
.lib has been provided here

http://xmlsoft.org/sources/win32/64bit/ 



It's explained in readme.txt.

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] encoding: fix memleak in xmlRegisterCharEncodingHandler()

2020-12-07 Thread Nick Wellnhofer via xml

On 07/12/2020 13:19, Xiaoming Ni wrote:

The return type of xmlRegisterCharEncodingHandler() is void. The invoker
cannot determine whether xmlRegisterCharEncodingHandler() is executed
successfully. when nbCharEncodingHandler >= MAX_ENCODING_HANDLERS, the
"handler" is not added to the array "handlers". As a result, the memory
of "handler" cannot be managed and released: memory leakage.

so add "xmlfree(handler)" to fix memory leakage on the failure branch of
xmlRegisterCharEncodingHandler().

Reported-by: wuqing 
Signed-off-by: Xiaoming Ni 
---
  encoding.c | 13 +++--
  1 file changed, 11 insertions(+), 2 deletions(-)


Merged here: 
https://gitlab.gnome.org/GNOME/libxml2/-/commit/649d02eaa419fa72ae6b131718a4ac77063d7a5a


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add, check "facet->val"

2020-12-07 Thread Nick Wellnhofer via xml

On 07/12/2020 13:17, Xiaoming Ni wrote:

The xmlSchemaGetFacetValueAsUlong() API is an external API.
The validity of external input parameters must be strictly verified.
Before accessing "facet->val->value", we need check whether "facet->val" is
a null pointer.

Signed-off-by: wuqing 
Signed-off-by: Xiaoming Ni 
---
  xmlschemastypes.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Merged here: 
https://gitlab.gnome.org/GNOME/libxml2/-/commit/cb7a572b3e7f568f1ebc8d91b1b8826a8ce3baa8


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] ping //Re: [PATCH] xmlschemastypes.c: xmlSchemaGetFacetValueAsULong add check "facet->val"

2020-12-06 Thread Nick Wellnhofer via xml

On 01/12/2020 08:05, Xiaoming Ni wrote:

ping


Your previous email didn't make it to the mailing list.


On 2020/11/24 14:55, Xiaoming Ni wrote:

The xmlSchemaGetFacetValueAsUlong() API is an external API.
The validity of external input parameters must be strictly verified.
Before accessing "facet->val->value", we need check whether "facet->val" is
a null pointer.

Signed-off-by: wuqing 
Signed-off-by: Xiaoming Ni 
---
  xmlschemastypes.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Please resend the full patch formatted with "git format-patch" or create a 
merge request on https://gitlab.gnome.org/GNOME/libxml2


Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] Fix xmlURIEscape memory leaks.

2020-11-09 Thread Nick Wellnhofer via xml
Merged here: 
https://gitlab.gnome.org/GNOME/libxml2/-/commit/7c06d99e1f4f853e3c5b307c0dc79c8a32a09855


Nick

On 27/10/2020 19:33, enh via xml wrote:

Found by running the fuzz/uri.c fuzzer under asan (internal Android bug
171610679).

Always free `ret` when exiting on failure. I've moved the definition of
NULLCHK down past where ret is always initialized to make it clear that
this is safe.

This patch also fixes the indentation of two of the NULLCHK call sites
to make it more obvious that NULLCHK isn't `if`-like.
---
  uri.c | 17 +
  1 file changed, 9 insertions(+), 8 deletions(-)

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] [PATCH] Fix xmlURIEscape memory leaks.

2020-11-06 Thread Nick Wellnhofer via xml

On 06/11/2020 00:54, enh via xml wrote:

ping?

(let me know if this should be a pull request somewhere instead...)


Sending patches to the mailing list is fine. It might take another week or 
two, but the issue will be addressed eventually.


Nick
___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Why does libxml2 limit port numbers to 999,999,999?

2020-10-17 Thread Nick Wellnhofer via xml
On Oct 17, 2020, at 12:24 , Richard W.M. Jones via xml  wrote:
> It seems like libxml2 chose to do this for convenience rather than
> correctness.

Yes, this is an arbitrary limit introduced to avoid integer overflow.
 
> I think it should accept port numbers at least up to
> signed int (the type used to store port numbers), and give an error if
> the port number overflows.

This is fixed now: 
https://gitlab.gnome.org/GNOME/libxml2/-/commit/b46016b8705b041c0678dd45e445dc73674b75d0

> Also could the uri->port field be changed to unsigned int without
> breaking ABI?

It’s a public struct member, so strictly speaking, no. But the risk to break 
stuff seems low.

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml


Re: [xml] Fix character column number of XML parse error on line with closing tag of element with namespace preceding it

2020-08-09 Thread Nick Wellnhofer via xml
On Jun 15, 2020, at 17:29 , Frederic Vancraeyveldt  wrote:
> I traced the code and I have a suggested fix in libxml_parser.patch.

Thanks, this should be fixed now with this commit:

https://gitlab.gnome.org/GNOME/libxml2/-/commit/b82fa3dd26a72c89ced293d06269eb97bb252d76
 
> I also modified xmllint a little bit to be able to show the error using that 
> tool.
> 
> That modification is in patch (libxml_error.patch)

This changes the format of error messages for all users of libxml2 and could 
break things if someone tries to parse the error message, for example.

> I just joined this mailing list. Please advise me if there is a better way of 
> reporting these issues.

The best way is to use GitLab: https://gitlab.gnome.org/GNOME/libxml2

Nick

___
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml