On 9/9/2010 at 09:43 PM, Karl Eichwalder <[email protected]> wrote: > Karl Eichwalder <[email protected]> writes: > > > Johannes Meixner <[email protected]> writes: > > > >> On Sep 8 04:42 Tim Serong wrote (shortened): > >>> # rpm -q --qf '%{DESCRIPTION}' $PACKAGE | sed \ > >>> -e 's/\&/\&/g' \ > >>> -e 's/</\</g' \ > >>> -e 's/>/\>/g' | awk \ > >>> 'BEGIN { print "<pre>" } { print } END { print "</pre>" }' > > > > Thanks, but <pre> is a no-op. These days, various PDAs with small > > displays are in use, and then, there is this ncurses yast interface... > > BTW, the BS is also one of these "players" and it keeps line-breaks, but > uses a proportional font... For example, see > https://build.opensuse.org/package/binary?arch=i586&filename=opensuse-manuals_ > > en-11.3-31.1.noarch.rpm&package=opensuse-manuals_en&project=Documentation&repositor > > y=openSUSE_11.3 > > This is why formatted text without self-explaining markup sucks.
Yeah. You could probably get incredibly close to generally acceptable in most cases with parsing rules like: 0) In general, assume there is an intent to produce paragraphs of text (<p> if you're displaying with HTML). 1) Double linebreaks are paragraph markers, unless appearing between list items per rules 3-7. 2) Single linebreaks are replaced with spaces, again unless appearing between list items per rules 3-7. 3) Any line starting with '*', '-', 'o', '+' followed by a space is a bulleted list item. Any leading space before the list marker is the list indent, provided there is a previous list item with a smaller (or zero) indent. 4) Any lines immediately following a list item are a continuation of that list item if they have leading whitespace. 5) If, after applying rules 3 and 4, there is only one list item (following lines are regular text or EOF), then it's not really a list item and should be treated instead as a paragraph (think: notes down the bottom of some text, marked with '*'). 6) Any lines starting with '[0-9]+\.?' followed by a space are numeric list items, for which rules 3-5 apply the same as they do for bulleted lists. 7) After a single line which matches the case-insensitive regex '^\s*Authors:?\s*$', each line with leading whitespace is a bulleted list item. 8) Anything that looks like an email address or URL (per suitable regexes) is to be treated as an email address or URL as appropriate. But I guarantee there will still be plain text that the above rules will mis-display in some annoying fashion. And if you change the rules to fix those, it'll break others. Regards, Tim -- Tim Serong <[email protected]> Senior Clustering Engineer, OPS Engineering, Novell Inc. -- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
