This one time, at band camp, Ken Foskey wrote:
>
>I am currently revising http://udk.openoffice.org website to fix the
>wording spelling and formatting of the website.  I am tripping over a
>lot of dead documents in the website so I want to start with a map and
>publish a list of all those documents not in the map for confirmation.
>So what do I need:
>
>Is there a way of identifying all pages that are not linked to?
>
>My thought was to start with a website map that shows what links to what
>and then graphviz this to show things that are orphans or branches that
>are orphans.
>
>Any suggestions on tools that extract links from websites,  website is
>checked out on my hard disk so file based is fine.

lynx -dump and some evil glue code to generate a graphviz file springs to
mind.   Actually some perl or python with a regex on a tags would work too.

-- 
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to