On 6/5/05 4:14 AM, jbv wrote:

Hi list,

I'm trying to build the fastest possible algorithm for the
following task :
I have a variable containing reference words; each "word"
in this variable is considered as an item.
I also have a list of "sentences" (each sentence being a
list of "words" separated by spaces), 1 per line. Each
sentence can contain 1 or more words from the reference,
as well as other words.

I need to determine for each sentence, which words from
the reference are included in it, in which order, and output
a list of offsets.
For example :
    reference variable :  W1,W2,W3,W4
    sentence 1 : Wx W2 Wy Wz W3
    output : 2,3

And last but not least, I need to keep only sentences that
contain more than 1 word from the reference.

My stab at it:

function calcwords tReference,tSentence
  --  tReference should be the comma-delimited word list, i.e.: "w1,w2,w3"
  --  tSentence is the user's space-delimited entry, i.e.: "wx w2 wy wz w3"
  put tReference into tRef -- so we can manipulate a copy
  replace comma with comma & cr in tRef
  split tRef by cr and comma
  replace space with comma & cr in tSentence
  split tSentence by cr and comma
  intersect tSentence with tRef -- it now has the right keys

  repeat for each line l in keys(tSentence)
    put itemoffset(l,tReference) & comma after tOutput
  end repeat
  delete last char of tOutput -- the comma
  if comma is in tOutput then return tOutput
  else return empty
end calcwords

I didn't time it but it seems like it should be faster.

--
Jacqueline Landman Gay         |     [EMAIL PROTECTED]
HyperActive Software           |     http://www.hyperactivesw.com
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to