On 9/9/2010 at 09:43 PM, Karl Eichwalder <[email protected]> wrote: 
> Karl Eichwalder <[email protected]> writes: 
>  
> > Johannes Meixner <[email protected]> writes: 
> > 
> >> On Sep 8 04:42 Tim Serong wrote (shortened): 
> >>>  # rpm -q --qf '%{DESCRIPTION}' $PACKAGE | sed \ 
> >>>        -e 's/\&/\&amp;/g' \ 
> >>>        -e 's/</\&lt;/g' \ 
> >>>        -e 's/>/\&gt;/g' | awk \ 
> >>>        'BEGIN { print "<pre>" } { print } END { print "</pre>" }' 
> > 
> > Thanks, but <pre> is a no-op.  These days, various PDAs with small 
> > displays are in use, and then, there is this ncurses yast interface... 
>  
> BTW, the BS is also one of these "players" and it keeps line-breaks, but 
> uses a proportional font...  For example, see 
> https://build.opensuse.org/package/binary?arch=i586&filename=opensuse-manuals_
>  
> en-11.3-31.1.noarch.rpm&package=opensuse-manuals_en&project=Documentation&repositor
>  
> y=openSUSE_11.3 
>  
> This is why formatted text without self-explaining markup sucks. 

Yeah.  You could probably get incredibly close to generally acceptable
in most cases with parsing rules like:

0) In general, assume there is an intent to produce paragraphs of
   text (<p> if you're displaying with HTML).
1) Double linebreaks are paragraph markers, unless appearing between
   list items per rules 3-7.
2) Single linebreaks are replaced with spaces, again unless appearing
   between list items per rules 3-7.
3) Any line starting with '*', '-', 'o', '+' followed by a space
   is a bulleted list item.  Any leading space before the list marker
   is the list indent, provided there is a previous list item with
   a smaller (or zero) indent.
4) Any lines immediately following a list item are a continuation
   of that list item if they have leading whitespace.
5) If, after applying rules 3 and 4, there is only one list item
   (following lines are regular text or EOF), then it's not really
   a list item and should be treated instead as a paragraph (think:
   notes down the bottom of some text, marked with '*').
6) Any lines starting with '[0-9]+\.?' followed by a space are numeric
   list items, for which rules 3-5 apply the same as they do for
   bulleted lists.
7) After a single line which matches the case-insensitive regex
   '^\s*Authors:?\s*$', each line with leading whitespace is a
   bulleted list item.
8) Anything that looks like an email address or URL (per suitable
   regexes) is to be treated as an email address or URL as
   appropriate.

But I guarantee there will still be plain text that the above rules
will mis-display in some annoying fashion.  And if you change the
rules to fix those, it'll break others.

Regards,

Tim


-- 
Tim Serong <[email protected]>
Senior Clustering Engineer, OPS Engineering, Novell Inc.



--
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to