On 12/11/2023 9:06 PM, William F Hammond wrote:
Hello Nasser,

You don't give us much to go on.  But it does provoke my curiosity.


Sorry, but I did send Michal detailed information on this.
I just added bug for tracking and did not think anyone else will
be interested in all the boring details of  my build.

I assume that you are able to build the 57,000 page pdf from the tex source
that you want to process with tex4ht.


Oh, yes ofcourse. The file builds OK in lualatex. Here is the link

<https://12000.org/my_notes/CAS_integration_tests/reports/summer_2023_Rubi_4_17_3/test_cases/210_Hebisch/report.htm>

THere are over 10,000 subsections,. and tex4ht breaks down on
reportsubsection1100

Which is this

<https://12000.org/my_notes/CAS_integration_tests/reports/summer_2023_Rubi_4_17_3/test_cases/210_Hebisch/reportsubsection1100.htm#x1117-109610003.10.84>

If you click <NEXT> from the top of the above page you get error link not found
since no more subsections are processed after that. There is almost 9,000
subsections that should be there. All are not generated.


Is html output the final tex4ht target?  I'm assuming it is.


Yes, only HTML (mathjax) mode.

You say:

[INFO]    make4ht-lib: parse_lg process file: reportsubsection1100.htm
[WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:
[WARNING] domfilter:
...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete
XML Document [char=33675]
From this I deduce that the 57,000 page document is being written in HTML
pieces by tex4ht, "reportsubsection1100.htm" is one of those pieces, and
perhaps not all expected pieces have been generated.

Have you checked whether "reportsubsection1100.htm" is well-formed XML
using, say, the tool "xmlwf" found in the expat distribution?


I build that code in reportsubsection1100.htm on its own, and it builds OK
with make4ht. It is only when the code is part of report.tex (the full
latex file which includes everything) where this problem is found.

I do not know xmlwf. I just see these domfilter and XML messages show
up and that is all. I really know very little about these.

But to help Michal, I send him ZIP file with everything in it so he can
reproduce this on his computer also.

It seems related to use tables, since that is the place where it fails.

--Nasser

             -- Bill


William F Hammond
Email: gel...@gmail.com
https://www.facebook.com/william.f.hammond
http://www.albany.edu/~hammond/

𝑻𝒉𝒆 𝒕𝒊𝒎𝒆 𝒕𝒐 𝒔𝒂𝒗𝒆 𝒂 𝒅𝒆𝒎𝒐𝒄𝒓𝒂𝒄𝒚 𝒊𝒔 𝒃𝒆𝒇𝒐𝒓𝒆 𝒊𝒕
𝒊𝒔 𝒍𝒐𝒔𝒕.   -- 𝐊𝐞𝐧 𝐁𝐮𝐫𝐧𝐬




On Mon, Dec 11, 2023 at 5:04 PM Nasser M. Abbasi <puszcza-hack...@gnu.org.ua>
wrote:

URL:
   <http://puszcza.gnu.org.ua/bugs/?618>

                  Summary: Incomplete XML Document, domfilter error,
truncated
build on large file.
                  Project: tex4ht
             Submitted by: nma123
             Submitted on: Tue Dec 12 01:04:12 2023
                 Category: None
                 Priority: 5 - Normal
                 Severity: 7 - Important
                   Status: None
                  Privacy: Public
              Assigned to: None
         Originator Email:
              Open/Closed: Open
          Discussion Lock: Any

     _______________________________________________________

Details:

I have been working with Michal on this via private email but thought to
enter
a bug report on this just for tracking and documentation.

I have one large file (57,000 PDF pages) that when compiled with tex4ht
(takes
14 hrs), and at about 10% when generating the final HTML pages, it gets XML
error and stops.

i.e. the 90% rest of the sections are missing from the final web pages.

-------------------------------------------------------

[INFO]    make4ht-lib: parse_lg process file: reportsubsection1100.htm
[WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:
[WARNING] domfilter:
...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete
XML Document [char=33675]

[INFO]    make4ht-lib: parse_lg process file: reportsubsection1100.htm
[WARNING] domfilter: DOM parsing of reportsubsection1100.htm failed:
[WARNING] domfilter:
...ive/2023/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Incomplete
XML Document [char=33675]

[INFO]    make4ht-lib: parse_lg process file: reportsubsection1100.htm

----------------------------------

I've just send Michal a link to complete self contained ZIP file (450 MB)
with
instructions how to run as standalone in order to see these errors on his
end.


I tried this on latest texlive 2023 on new Linux installation.

I will work with Michal to provide any additional information he needs from
me, to hopefully find the cause of this problem.

This happens only on this file. I think may be due to the large size, since
the Latex code is all generated by same program and only this file gives
this
error.

--Nasser





     _______________________________________________________

Reply to this item at:

   <http://puszcza.gnu.org.ua/bugs/?618>

_______________________________________________
   Message sent via/by Puszcza
   http://puszcza.gnu.org.ua/




Reply via email to