Gentlemen, > > The safest approach is probably to pass the html through tidy, and > > then into DOM, and traverse and count the length of text nodes, but > > that would be quite slow if you ran it on every request. > > Right, +1 for Tidy and DOM, it's the "real" way to do it. You won't > need to do it on every request -- you can either store the summary > itself as a separate text field, or store the length of the summary as > an integer.
I tried this, working through using both DOM and Tidy, and combinations of each - no luck. The problem is getting the differential between the two versions of the text. > This is crying out for a web service: The Excerpter. POST markup, get > the first X display characters back as a response, with embedded HTML > intact. Yeah, I agree - this has turned into a royal problem, and one that seems as though it'd had to be solved already. At the end of the day, what would be a very handy library - an object/etc that would store the text, in various forms, include various manipulation methods on it, meta data, etc, etc. I had written something like this for MIME, but would not look forward to doing it for HTML/etc. H _______________________________________________ New York PHP Community Talk Mailing List http://lists.nyphp.org/mailman/listinfo/talk NYPHPCon 2006 Presentations Online http://www.nyphpcon.com Show Your Participation in New York PHP http://www.nyphp.org/show_participation.php
