Re: [U2] General guidelines on indexing

Martin Phillips Wed, 08 Jul 2009 09:58:55 -0700

Hi all,

I don't agree. Disk access is inherently slower than RAM access.

I think that this discussion started for Unidata and then got UniVerseinvolved too but it might have been the other way around. Sadly, there is nointernals training material for Unidata so we have to guess what goes on.

Different multivalue products approach string management in varying ways. InUniVerse, strings are stored as contiguous memory. If I write a statementsuch as

  X<-1> = 'ABC'

this run machine has to work out how big the new string will be, allocatememory, copy the old value of X to the new area appending ABC to it, andthen release the original memory used by X.

As you append successive fields, the string to be moved gets longer andlonger. We tend to think of computers as being blindingly fast but copying abig string is still a slow process. If I have a string that starts empty andI add a million fields, each of 3 bytes plus the delimiter, I will end upcopying a total of 1,999,998,000,000 bytes - hardly an insignificant task.

From my own experiments some time ago, I believe that Unidata also uses

contiguous strings but I have no direct proof of this. The alternative(adopted by our QM product, by PI/open, Information and perhaps others) isto use "chunked strings" where a string is stored as a series of chunks. Inthis model, appending a field requires only addition of a new chunk or, forbetter performance, replacement of the final chunk.

Of course, the performance gain of chunked strings in this example may beoffset by their decreased performance for things like substring extractionwhich is now more complex than a simple indexing operation.


By way of a simple expample, I just tried the following program...
  s = ''
  z = str('*', 1000)
  t1 = time()
  for i = 1 to 100000
     s<-1> = z
  next i
  t2 = time()
  crt t2 - t1

This took six seconds on QM but 32 minutes on UniVerse. I do not have aUnidata system available at the moment to try. To be fair, I am sure that Icould construct an example that reversed the performance difference.

Writing to a sequential file is somewhat similar to the chunked string modelas it buffers data until it has a good sized chunk and then writes it out,continuing with an empty buffer.



Martin Phillips
Ladybridge Systems Ltd
17b Coldstream Lane, Hardingstone, Northampton, NN4 6DB

+44-(0)1604-709200

_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] General guidelines on indexing

Reply via email to