On Apr 2, 2008, at 4:43 PM, thebigdog wrote:
Adrian Holovaty (creator of ChicagoCrime.org and Django) has a Python script called templatemaker[1][2], which in theory would do what I want. You feed it a bunch of similar web pages and it produces a template with "holes" where the data was different across each web page. In practice, it's too granular; it doesn't recognize HTML. It looks at every I don't care about spaces between tags. I only care about substantial content differences
across pages. Everything else can be moved to the template.
you could try running everything through HTML Tidy first, see if that
normalizes whitespace and such. then run templatemaker and see how
that works out.

you could use a diff program to find out where they are different and the kinda do the reverse and come up with the similarities...however i would do it after running it all through tidy first.

If it was up to me then i would look at taking 1 page and creating a template from it and then extract all the data you need to populate other pages with that template.

Thanks, Justin and Ray. Good ideas.


_______________________________________________

UPHPU mailing list
[email protected]
http://uphpu.org/mailman/listinfo/uphpu
IRC: #uphpu on irc.freenode.net

Reply via email to