Re: Semi-automatic Index generation?

2008-07-31 Thread Devin Asay
On Jul 31, 2008, at 2:12 AM, viktoras didziulis wrote: Hi David, you might wish to discard the 1000 most frequently used words from your list: English: http://web1.d25.k12.id.us/home/curriculum/fuw.pdf German: http://german.about.com/library/blwfreq01.htm Another approach is statistical -

Re: Semi-automatic Index generation?

2008-07-31 Thread David Bovill
Thanks for the tips! 2008/7/31 viktoras didziulis <[EMAIL PROTECTED]> > Hi David, > > you might wish to discard the 1000 most frequently used words from your > list: > English: http://web1.d25.k12.id.us/home/curriculum/fuw.pdf > German: http://german.about.com/library/blwfreq01.htm > > Another ap

Re: Semi-automatic Index generation?

2008-07-31 Thread viktoras didziulis
Hi David, you might wish to discard the 1000 most frequently used words from your list: English: http://web1.d25.k12.id.us/home/curriculum/fuw.pdf German: http://german.about.com/library/blwfreq01.htm Another approach is statistical - take the whole text, sort words by their frequency (count)

Re: Semi-automatic Index generation?

2008-07-30 Thread David Bovill
Thanks Eric! 2008/7/30 Eric Chatonet <[EMAIL PROTECTED]> > Bonjour David, > > Le 30 juil. 08 à 16:08, David Bovill a écrit : > > > Is there a resource/ index that any one knows of for plain uninteresting >> dull words. I want to take arbitrary chunks of text and search for >> "interesting" words

Re: Semi-automatic Index generation?

2008-07-30 Thread Eric Chatonet
Bonjour David, Le 30 juil. 08 à 16:08, David Bovill a écrit : Is there a resource/ index that any one knows of for plain uninteresting dull words. I want to take arbitrary chunks of text and search for "interesting" words - that is domain specific words that might be useful to links to crea

Semi-automatic Index generation?

2008-07-30 Thread David Bovill
Is there a resource/ index that any one knows of for plain uninteresting dull words. I want to take arbitrary chunks of text and search for "interesting" words - that is domain specific words that might be useful to links to create dictionary entries. This would mean creating a list of words and st