2017-07-09 23:58 GMT+02:00 Jean-Francois Nifenecker < jean-francois.nifenec...@laposte.net>:
> Hello Gilles, > > Le 09/07/2017 à 19:20, Gilles a écrit : > >> Hello, >> >> This PDF file >> <https://www.legifrance.gouv.fr/download_code_pdf.do?cidText >> e=LEGITEXT000006074228&dlType=pdf> >> has no Table of Contents, and I was wondering if LO could grab all the >> headers and build a TOC. >> > > In order to create a PDF with a TOC/index you'll have to set heading > styles to the appropriate paragraphs. > > Opening a PDF with LibO won't go anywhere as the tool for that is Draw > which can't set styles for a text processor. > > I can't see a way to do that quickly, I'm afraid: a copy/paste from the > PDF document to Writer is possible but you'll have to fix a lot of things > (eg. useless carriage returns) and apply heading styles by hand. On a 400+ > pages document this a big PITA. > > Hopefully someone else will come with brighter ideas. > > > You want brighter ideas? Say no more! So... hmm... I'm afraid there won't be many fully-automated tools that can build a TOC for you. A PDF basically contains a lot of individual elements, that are arranged to look like something coherent. From the document you linked, it could theoretically be possible to write a tool that split every pages, grab the raw text, use a regex to find actual titles, build a TOC, and inject it in the PDF. This would assume: - Text extraction works correctly (it's not always the case with PDF) - Titles always follow the same format But on this kind of document, you could definitely get some acceptable results. I experimented a bit. The output is here: http://www.cjoint.com/c/GGjw0OtPkGc And for the curious, the "script" I used is here: https://pastebin.com/icQSZxQr As you'll see, it is VERY specific to this document, but it is possible to do something. -- To unsubscribe e-mail to: users+unsubscr...@global.libreoffice.org Problems? http://www.libreoffice.org/get-help/mailing-lists/how-to-unsubscribe/ Posting guidelines + more: http://wiki.documentfoundation.org/Netiquette List archive: http://listarchives.libreoffice.org/global/users/ All messages sent to this list will be publicly archived and cannot be deleted