Re: How to filter a big list

Alex Tweedly Thu, 22 Oct 2009 15:33:53 -0700

Richard Gaskin wrote:

Jérôme Rosat wrote:
I explained in my message that I wish to filter a list of names andaddresses dynamically when I type a name in a field. This listcontains 400'000 lines like this: Mme [TAB] DOS SANTOS albertina[TAB] rue GOURGAS 23BIS [TAB] 1205 Genève
I made various tests using the "repeat for each" loop and the"filter ... with" command. Filtering takes the most time when I typethe first and the second letter. That takes approximately 800milliseconds for the first char and about 570 milliseconds for thesecond char. The repeat loop with the "contains" operator is alittle beat slower (about 50 milliseconds) than the "filter ...with". There is no significant difference when the third char ormore is typed. Of course I filter a variable before to put it in thelist field.
Obviously, 800 milliseconds to filter a list of 400'000 lines, it isfast. But it is too slow for what I want to do. It would take a timeof filtering lower than 300 milliseconds so that the user is notslowed down in his typing.
Would it be practical to break your list into 26 sublists by firstletter?

That's a pragmatic approach - but I think it's the wrong one.

The fundamental problem is that the idea of scanning an entire list atkeystroke speed is not robust. Even if splitting into multiple listsworks for now, there's no guarantee that it will work tomorrow - whenthe database doubles in size, or the data becomes skewed because itcontains too many people with the same first letter, or .... or theusers demand a similar feature for address as well as surname, or theywant to match string anywhere within the name, or ....

What you ought to do (imnsho) is to change the algorithm to one which isinherently responsive, using either 'send' or 'wait-with-messages' toensure that this matching process does not interfere withresponsiveness. In this case, I think it's easier to use wait-with-messages.


So in outline

each time the match data changes, you restart the matching process

the matching process checks a fixed, and relatively small, number ofpossible matches

     updates the field showing the user what matches have been found

and then allows other things to happen before continuing with thematching.

I'd have a single handler that is always called when any changes happensto the user input, which can kick off a new matching process (by sendingto the match handler). Then within the handler, I'd periodically checkwhether there is a pending message to restart a new handler.


So a brief version of the whole script would be

local sShow, sStart, sData, sFound,sMatch
global gData

on keyUp
   matchStringHasChanged
   pass keyUp
end keyUp

on matchStringHasChanged
   send "processamatch" to me in 0 millisecs
end matchStringHasChanged

on processamatch
   local tCount

put gData into sData

   put the text of field "Field" into sMatch

put empty into field "Show"

   put empty into sShow

repeat for each line L in sData

      add 1 to tCount
      if L begins with sMatch then
         put L &CR after sShow
      end if
      if tCount mod 100 = 0 then
         put sShow & "....." & CR into field "Show"
         wait 0 millisecs with messages
         if the pendingmessages contains ",processamatch," then
            put "exiting" & CR after field "StatusLog"
            exit processamatch
         end if
      end if
   end repeat
   put sShow into field "Show"

put "Completed" && the number of lines in sShow &CR after field"StatusLog"

end processamatch

Note the use of "......" to give an indication that work is still inprogress and there may be more matches to come.


You could easily add refinements to this

1a. if a matching process has completed (rather than exited), and ifprevious match string was a substring of the new matchstring, theninstead of starting with

         put gData into sData
you could instead do

put sShow into sData(i.e. re-use the filtered list - but be sure to remember that if youexit before completing, or if the matchstring changes in any other wayyou need to restart with the whole of gData)

1b. If you do 1a, then if you are *nearly* complete with a match whenthe matchstring changes, then just go ahead and complete it, so you getto work on the subset.

(good luck deciding what *nearly* means :-)

btw - I don't think there is any magic 'split'-based method possible here.


-- Alex.
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: How to filter a big list

Reply via email to