Just curious : why do you use "read from file F for 1 line" ?
Instead, why don't you read the file in one go, load the content
into a variable and then use "for each line j in myVar" ?
Wouldn't that run faster ?
Best,
jbv
Le 2022-07-25 02:42, Geoff Canyon via use-livecode a écrit :
As a meta point, I wrote a version closer to the actual requirements --
lowercase everything, process an external input line by line to allow
for
arbitrary input size. The result is about 8-10x slower than most other
languages -- not as bad as I feared, not as good as I hoped. Here's the
code for that version:
on mouseUp
answer file "choose input:"
if it is empty then exit mouseUp
put it into F
lock screen
put the long seconds into T
open file F for read
repeat
read from file F for 1 line
repeat for each word w in toLower(it)
add 1 to R[w]
end repeat
if the result is not empty then exit repeat
end repeat
combine R using cr and tab
sort R numeric descending by word 2 of each
put the long seconds - T into T1
put R into fld "output"
put the long seconds - T into T2
put T1 && T2
close file F
end mouseUp
On Sun, Jul 24, 2022 at 11:01 PM Geoff Canyon <gcan...@gmail.com>
wrote:
On this Hacker News thread
<https://news.ycombinator.com/item?id=32214419>,
I read this programming interview question
<https://benhoyt.com/writings/count-words/>. Roughly, the challenge is
to
count the frequency of words in input, and return a list with counts,
sorted from most to least frequent. So input like this:
The foo the foo the
defenestration the
would produce output like this:
the 4
foo 2
defenestration 1
Of course I smiled because LC is literally built for this problem. I
took
well under two minutes to write this function:
function wordCount X
repeat for each word w in X
add 1 to R[w]
end repeat
combine R using cr and tab
sort R numeric descending by word 2 of each
return R
end wordCount
There are quibbles -- the examples given in the article work line by
line,
so input size isn't an issue, and of course quotes would cause an
issue,
and LC is case insensitive, so it works, but the output would look
like
this:
The 4
foo 2
defenestration 1
But generally, it works, and is super-easy to code. But for the sake
of
argument, consider this Python solution given:
counts = collections.Counter()
for line in sys.stdin:
words = line.lower().split()
counts.update(words)
for word, count in counts.most_common():
print(word, count)
That requires a library, but it's also super-easy to code and
understand,
and it requires just the same number of lines. So, daydreaming
extensions
to LC syntax, this comes to mind:
function wordCount X
add 1 to R[w] for each word w in X
return R combined using cr and tab and sorted numeric descending by
word 2 of each
end wordCount
or if you prefer:
function wordCount X
for each word w in X add 1 to R[w]
return (R combined using cr and tab) sorted numeric descending by
word
2 of each
end wordCount
Or to really apply ourselves:
function wordCount X
return the count of each word in X using cr and tab sorted numeric
descending by word 2 of each
end wordCount
So: the xTalk syntax is over thirty years old; when was the last
significant syntax update?
(I'm not at all core to the process, so feel free to tell me how much
I've
missed lately!)
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your
subscription preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode