Just curious : why do you use "read from file F for 1 line" ?
Instead, why don't you read the file in one go, load the content
into a variable and then use "for each line j in myVar" ?
Wouldn't that run faster ?

Best,
jbv


Le 2022-07-25 02:42, Geoff Canyon via use-livecode a écrit :
As a meta point, I wrote a version closer to the actual requirements --
lowercase everything, process an external input line by line to allow for
arbitrary input size. The result is about 8-10x slower than most other
languages -- not as bad as I feared, not as good as I hoped. Here's the
code for that version:

on mouseUp
   answer file "choose input:"
   if it is empty then exit mouseUp
   put it into F
   lock screen
   put the long seconds into T
   open file F for read
   repeat
      read from file F for 1 line
      repeat for each word w in toLower(it)
         add 1 to R[w]
      end repeat
      if the result is not empty then exit repeat
   end repeat
   combine R using cr and tab
   sort R numeric descending by word 2 of each
   put the long seconds - T into T1
   put R into fld "output"
   put the long seconds - T into T2
   put T1 && T2
   close file F
end mouseUp

On Sun, Jul 24, 2022 at 11:01 PM Geoff Canyon <gcan...@gmail.com> wrote:

On this Hacker News thread <https://news.ycombinator.com/item?id=32214419>,
I read this programming interview question
<https://benhoyt.com/writings/count-words/>. Roughly, the challenge is to
count the frequency of words in input, and return a list with counts,
sorted from most to least frequent. So input like this:

The foo the foo the
defenestration the

would produce output like this:

the 4
foo 2
defenestration 1

Of course I smiled because LC is literally built for this problem. I took
well under two minutes to write this function:

function wordCount X
   repeat for each word w in X
      add 1 to R[w]
   end repeat
   combine R using cr and tab
   sort R numeric descending by word 2 of each
   return R
end wordCount

There are quibbles -- the examples given in the article work line by line, so input size isn't an issue, and of course quotes would cause an issue, and LC is case insensitive, so it works, but the output would look like
this:

The 4
foo 2
defenestration 1

But generally, it works, and is super-easy to code. But for the sake of
argument, consider this Python solution given:

counts = collections.Counter()
for line in sys.stdin:
    words = line.lower().split()
    counts.update(words)

for word, count in counts.most_common():
    print(word, count)

That requires a library, but it's also super-easy to code and understand, and it requires just the same number of lines. So, daydreaming extensions
to LC syntax, this comes to mind:

function wordCount X
   add 1 to R[w]  for each word w in X
   return R combined using cr and tab and sorted numeric descending by
word 2 of each
end wordCount

or if you prefer:

function wordCount X
   for each word w in X add 1 to R[w]
return (R combined using cr and tab) sorted numeric descending by word
2 of each
end wordCount

Or to really apply ourselves:

function wordCount X
   return the count of each word in X using cr and tab sorted numeric
descending by word 2 of each
end wordCount

So: the xTalk syntax is over thirty years old; when was the last
significant syntax update?

(I'm not at all core to the process, so feel free to tell me how much I've
missed lately!)


_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to