This one time, at band camp, Ken Foskey wrote: > >I am currently revising http://udk.openoffice.org website to fix the >wording spelling and formatting of the website. I am tripping over a >lot of dead documents in the website so I want to start with a map and >publish a list of all those documents not in the map for confirmation. >So what do I need: > >Is there a way of identifying all pages that are not linked to? > >My thought was to start with a website map that shows what links to what >and then graphviz this to show things that are orphans or branches that >are orphans. > >Any suggestions on tools that extract links from websites, website is >checked out on my hard disk so file based is fine.
lynx -dump and some evil glue code to generate a graphviz file springs to mind. Actually some perl or python with a regex on a tags would work too. -- SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/ Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html