On Sat, Dec 12, 2015 at 7:24 AM, Geoff Canyon <gcan...@gmail.com> wrote:

>
> > I don't think I do agree with 'trying to do something sensible with the
> > whitespace' as I don't really see why that would be useful.
>

> ​I agree that "sensible" is difficult to define here.


OK, quickly off the top of my head I can think of 3 likely outputs, but
before I do I'll suggest one case where it might be useful. The ability to
take a basic html document and set an LC's field html text to that and then
take the text of the field is a very neat and powerful way to extract the
plane text from a html page. I deal with a lot of electronic documents that
are produced by people with varying levels of computer skills. One of my
pet peeves is how many people don't know how to use tab markers in
documents and instead build tabulated data by using multiple spaces. Much
of the documents I deal with have passed through OCR software which has a
obsessive tendency to add multiple spaces between words, not just in
tabulated data but if paragraphs are full width justified. I spend a lot of
computer cycles going through text and either converting multiple spaces to
tabs or single spaces. Having multiple adjacent white space characters is
not normal, so any process that happened to automatically remove such
instances would be a bonus.

So the 3 possibilities are:

1) The example I gave in my last post. However many tabs and spaces would
remain the same, they would be ordered, tabs before spaces, segments
(words) would be placed between them. There would only be one white space
between each segment (word) so some segments might have tabs between them
and others spaces.You might end up with multiple tab/spaces at the
beginning of the output, just as you end up with multiple empty items at
the beginning of a sorted List if there are multiple empty items. This is
ugly.

2) The List is outputted with a single space between each segment. This
would mean that if there happened to be tabs or multiple spaces between
certain segments, these would be removed/converted. This is helpful.

3) A straight reshuffle, where the actual segments are reordered whilst
preserving the white space location:

[tab][tab]Mark[space][space]Geoff[tab]Kevin[space][tab]Richard[space][space]

would become:

[tab][tab]Geoff[space][space]Kevin[tab]Mark[space][tab]Richard[space][space]

This last case, although less helpful to me, would arguable be the computer
logical thing to do, all you've done is asked to sort the segments (words);
the number of characters remains the same, the location of the white space
has remained the same, the only thing that has changed is the order in
which the words appear - and that's what you asked for. When you think
about it, that's really all that sort by line or item does, it leaves the
CRs and commas in place and just shuffles things about.

All LC has to do is pick one, implement it and then publish it. If people
don't like the choice then they have to roll their own, but they have to
roll their own now anyway. Whilst I'd think most people needing to sort
words would like the 'benefits' of option 2, I hate to say it, but I think
option 3 would be the 'safer' road LC could go down.
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to