[ https://issues.apache.org/jira/browse/SOLR-10934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058339#comment-16058339 ]
Hoss Man edited comment on SOLR-10934 at 6/21/17 10:13 PM: ----------------------------------------------------------- based on the code in this SO post, it looks like we should be able to.. https://stackoverflow.com/a/38846776/689372 * loop over all PDAnnotations in each PDPage * if the annotation isa PDAnnotationLink then we can access it's PDAction and PDDestination * PDActionURI is an external link, PDActionGoTo is an inter-document link * PDActionGoTo can point at either a PDPageDestination (page num?) or a PDNamedDestination (named anchor?) * we lookup PDNamedDestination instances in the document catlog. that _should_ enable us to vet that all inter-document links point to a valid anchor. one thing i'm not sure about is if would be possible to check for the "anchor used more then once in diff adoc files" type problem -- i suspect that the catalog's list of PDNamedDestination doesn't allow dups, so that info may already be lost as part of the PDF creation?? was (Author: hossman): based on the code in this SO post, it looks like we should be able to.. * loop over all PDAnnotations in each PDPage * if the annotation isa PDAnnotationLink then we can access it's PDAction and PDDestination * PDActionURI is an external link, PDActionGoTo is an inter-document link * PDActionGoTo can point at either a PDPageDestination (page num?) or a PDNamedDestination (named anchor?) * we lookup PDNamedDestination instances in the document catlog. that _should_ enable us to vet that all inter-document links point to a valid anchor. one thing i'm not sure about is if would be possible to check for the "anchor used more then once in diff adoc files" type problem -- i suspect that the catalog's list of PDNamedDestination doesn't allow dups, so that info may already be lost as part of the PDF creation?? > create a link+anchor checker for the ref-guide PDF using PDFBox > --------------------------------------------------------------- > > Key: SOLR-10934 > URL: https://issues.apache.org/jira/browse/SOLR-10934 > Project: Solr > Issue Type: Sub-task > Security Level: Public(Default Security Level. Issues are Public) > Components: documentation > Reporter: Hoss Man > > We currently have CheckLinksAndAnchors.java which is automatically run > against the ref-guide HTML as part of the build to use JSoup to find bad > links/anchors that asciidoctor doesn't complain about -- but not everyone > does/can build the HTML version of the ref-guide sincif we can e it requires > manually installing jekyll. > The PDF build only requires things installed by ivy (via JRuby) and we > already have some PDFBox based code in ReducePDFSize.java that operates on > this PDF every time it's run -- so if we can find a way to do similar checks > using the PDFBox API we could catch these broken links faster. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org