Yes! That sounds like it's on the same track as I was thinking, except I'm not sure if it would end up with every combination. It would need to consider not only "in the", "in the beginning", etc., but also "the beginning", etc., that would need not only making the phrases longer and longer, but should also start with each consecutive word, using every length beginning from every word as long as (of course) the whole length doesn't go beyond the end of the book.
The results might need to have an additional reference number for the number of the word in the verse, so "in", "in the", "in the beginning", "the", "the beginning", etc. would be referenced like: 1:1:1:1 (1 word phrase), 1:1:1:1 (2 word phrase), 1:1:1:1 (3 word phrase), 1:1:1:2 (1 wp), 1:1:1:2 (2 wp), etc. If a list of each unique word along with id numbers for each could be used in conjunction with a book full of those numbers instead of the words they represent, iterating through all the different combinations would probably be less processor intensive (which is a major problem I ran into). Using numbers instead of words might also make analysis of the finished product easier. I hope my words are clear enough. If you'd like further thoughts from me, please ask. JB ----- Original Message ----- From: "Troy A. Griffitts" <[EMAIL PROTECTED]> To: "SWORD Developers' Collaboration Forum" <[email protected]> Sent: Friday, June 23, 2006 6:06 PM Subject: Re: [sword-devel] phrasal concordance > Jeremy, > It would be interested to write a text analysis program that followed > some algol like: > > search("in the"), results? store write an entry: ["in the" : result > verses] and add a word > search("in the beginning").... > > if no results, drop one word at the front, search ("the beginning") and > continue adding words and writing entries until no results. > > > Not sure if this would be the best way to produce such research, but it > would be neat to see such. Maybe a first pass which scores every word > by the total number of times used. Then you could score phrases higher > by number of words and words less frequent. > > It sounds like it might produce interesting research data. > > -Troy. > > > > Jeremy Bickel wrote: >> Hello all. I really hope this is the right place for this. If not, >> please forgive me. :-D >> >> What about a hard coded (smallish, if possible) concordance of every >> sized phrase in any book (Genesis, Ruth, Josephus, etc.), from 1 to x >> words (where 1 word phrases would be a traditional concordance)? Then >> this could be used, for instance, to quickly identify similarities of >> text. A 20 word long phrase in a single book (that's doesn't go between >> books), found in multiple places, perhaps in multiple books, might shed >> light on significant phrases, which light might be otherwise obscure. >> >> On a first look, this might not seem important, because a search is >> already incorporated into Sword. But in thinking about this a while, I >> can see very good possibility with it. >> >> Thanks. >> ########################### >> God is love Himself. God is completely just. Fear Him and be at peace. _______________________________________________ sword-devel mailing list: [email protected] http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page
