Thanks for the tips! 2008/7/31 viktoras didziulis <[EMAIL PROTECTED]>
> Hi David, > > you might wish to discard the 1000 most frequently used words from your > list: > English: http://web1.d25.k12.id.us/home/curriculum/fuw.pdf > German: http://german.about.com/library/blwfreq01.htm > > Another approach is statistical - take the whole text, sort words by their > frequency (count) of appearance in the text. If you put them on a graph you > would notice characteristic 'power law' distribution. Set the absolute or > relative frequency or count value at which to cut the tail. This tail is > what holds all the rare or interesting words of the text. For example if the > text is large you may discard the first 500-1000 words in the list sorted by > word count. All words that remain should be the ones that are more-less > interesting. > > The easy way produce such a frequency list is by using arrays. The > principle is like this: > > local arrayWords > repeat for each word myWord in theText > add 1 to arrayWords[myWord] > end repeat > > now the keys are words and values are word counts in arrayWords. > > Best wishes > Viktoras > > > David Bovill wrote: > >> Is there a resource/ index that any one knows of for plain uninteresting >> dull words. I want to take arbitrary chunks of text and search for >> "interesting" words - that is domain specific words that might be useful >> to >> links to create dictionary entries. This would mean creating a list of >> words >> and stripping "the" "it" etc. I am imagining it working like a spelling >> dictionary with the ability to manually edit entries - but I'd like a good >> starting list? Not sure what to search for :) >> _______________________________________________ >> use-revolution mailing list >> use-revolution@lists.runrev.com >> Please visit this url to subscribe, unsubscribe and manage your >> subscription preferences: >> http://lists.runrev.com/mailman/listinfo/use-revolution >> >> >> >> > > _______________________________________________ > use-revolution mailing list > use-revolution@lists.runrev.com > Please visit this url to subscribe, unsubscribe and manage your > subscription preferences: > http://lists.runrev.com/mailman/listinfo/use-revolution > _______________________________________________ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution