On 4/22/2022 10:11 AM, Michal Hoftich wrote:
Hi Nasser,


It now compiled the MWE given below. It does take a long
time. This is another issue for another email. But tex4ht took
110 minutes to compile this file, (which is empty, but has
lots of empty chapters and sections), while lualatex about 2 minutes.

Tex4ht is really slow.



The main problem is that your files are so extremely huge. TeX4ht
needs some extra memory, because it declares link destinations for all
sections and for each cut page.


Hello Michal. Thanks for taking the time to look at the problem.

I understand the document is long. (actually the MWE I gave
is empty, but has lots of chapters/sections/subsections to
show the problem).

But I do have more than enough Physical RAM. This is new PC with one of
the fastest  intel CPU and mximum 128 GB RAM.

I monitor the ram all the time as tex4ht is compiling, and
it does not use more than 8GB to 10GB. This below shows the current
RAM compiling with "4" split level, including 2 other tex4ht builds
running at same time in separate terminals:

free -mh
               total        used        free      shared  buff/cache   available
Mem:          100Gi       8.7Gi        83Gi       0.0Ki       8.4Gi        90Gi
Swap:          26Gi          0B        26Gi

So I am failing to see how is this a physical memory issue at all.

Please help me understand how this is due to memory not available for
tex4ht to compile the document.  If tex4ht runs out of memory,
would it not give an error and stop?


Now for the issue of losing links.

I used this command

make4ht  -ulm default foo.tex "mathjax,htm,fn-in,4"

Try

make4ht  -ulm draft foo.tex "mathjax,htm,fn-in,3"

It will produce separate files for sections, but not for subsections.
You should save a lot of memory and compilation time using this.


Thanks for the suggestion. Yes, this compiles much faster and does
not produce missing links as with "4" instead of "3".

Just to be clear what we are taking about:

===================
make4ht  foo.tex "mathjax,htm,fn-in,4"
Takes over 2 hrs to compile, produces DOM errors

[INFO]    make4ht-lib: parse_lg process file: fosu36492.htm
[WARNING] domfilter: DOM parsing of fosu36492.htm failed:
[WARNING] domfilter: 
...ive/2021/texmf-dist/tex/luatex/luaxml/luaxml-mod-xml.lua:175: Error Parsing 
XML [char=5325]

And produces wrong links that leads to no pages as discuessed earlier.
The links go dead after chapter 19, section 3 subsection 93 as
I said before.

make4ht  foo.tex "mathjax,htm,fn-in,3"
Takes only 12 minutes and produces no bad links.
============================

But this difference on its own shows clearly there is a serious problem
in tex4ht in this area.

Why would it take 10 or 20 times longer time to compile when
using "4" vs. "3"?

A little more time is to be expected, but this much longer? May be
the algorithm  or data structure used by tex4ht to handle and manage
the splitting is not optimized? Does this mean using "5" will now
take 100 times longer to compile? I am afraid to try :)

I remember you once mentioned the code for splitting in tex4ht
is very old and hard to understand which is something I understand
and appreciate.

This workaround of changing "4" to "3" on this document
can't really be a solution for me. The issue is not how long
it takes to compile. I can wait. If I can get correct pages
generated that loads fast.

The issue is the missing links and missing sections in table
of content and seemingly corrupt pages that show up with large
blank spaces in them which the MWE shows.

I have sections that have many subsections (up to 50 or 100) in
them. Each subsection is also large. Having them all show on same
webpage, means the pages becomes very slow to open. Even
with mathjax for math, which is not practical.

No one will want to browse a website that takes 5 minutes each time
to open a large pages. Also, it harder to browse very
long pages.

That is why I wanted to have each subsection on each separate
and smaller webpage, so they open fast and easier to view.

Best regards,
Michal

If you have any other ideas or something I should try, please
feel to let me know. Right now, I am no longer able to build
my long document because of this.

Thanks,
--Nasser

Reply via email to