Gentlemen,

> > The safest approach is probably to pass the html through tidy, and
> > then into DOM, and traverse and count the length of text nodes, but
> > that would be quite slow if you ran it on every request.
> 
> Right, +1 for Tidy and DOM, it's the "real" way to do it. You won't
> need to do it on every request -- you can either store the summary
> itself as a separate text field, or store the length of the summary as
> an integer.

I tried this, working through using both DOM and Tidy, and combinations of each 
- no luck.  The problem is getting the differential between the two versions of 
the text.

> This is crying out for a web service: The Excerpter. POST markup, get
> the first X display characters back as a response, with embedded HTML
> intact.

Yeah, I agree - this has turned into a royal problem, and one that seems as 
though it'd had to be solved already.

At the end of the day, what would be a very handy library - an object/etc that 
would store the text, in various forms, include various manipulation methods on 
it, meta data, etc, etc.  I had written something like this for MIME, but would 
not look forward to doing it for HTML/etc.

H


_______________________________________________
New York PHP Community Talk Mailing List
http://lists.nyphp.org/mailman/listinfo/talk

NYPHPCon 2006 Presentations Online
http://www.nyphpcon.com

Show Your Participation in New York PHP
http://www.nyphp.org/show_participation.php

Reply via email to