[Unicon-group] Venn's revenge - a data normalization problem

Art Eschenlauer Sun, 14 Sep 2008 04:19:48 -0700

Is there a clever (i.e., cheap) approach for searching a group of sets for
a "test" set?
I suppose another way of asking this question is whether there is an
Icon-like way of hashing sets (by using hashing that is built into the
implementation, avoiding brute-force explicit computation of a hash in
Icon code)?


This is the essence a subproblem of a problem that I am working on (the
problem happens to be inventorying versions of files on many computers,
many of them similar).  Here is the long description of this subproblem:

I want to maintain a group of properties related to instances of an entity.

Many instances share exactly the same group of property values.
There may be many instances of the entity, and the number of attributes is
not known a priori.
It may be (and frequently is) that there is a different number of
properties applicable to different instances.
Because there are so many instances, it is too expensive to maintain a
list of all property values for each instance.

Each property value consists of a property identifier and three values,
which are grouped in an Icon record.
(Unicon objects may be an acceptable alternative.)
I collect the properties for a given instance of the entity in a
non-deterministic order.

What I would like to do is to save memory by associating a reference to a
structure, so that each of the many instances with the same group of
property values will only need to store a reference to the structure
representing that group.
To simplify the problem, I am assuming fewer than 256 attributes; however,
I am assuming more than 256 groups.

My thought was to "normalize" my data using an Icon set structure for each
group of property value records.
(I believe that a table structure could work just as well.)
Each entity record would have a field referencing such a set.
Each time that I learn a new property value for the instance represented
by the entity record, I would go to a master list of sets, look for or
construct a set consisting of the union of the newly acquired property
value and the entity record's current set, and assign a reference to this
other set to the entity record's properties field.
"Look for" means constructing a set that is a union of the record's
current set and the new property value and then testing each set in the
master list for equivalence to the constructed set.
(I am thinking of using the seteq procedure from sets.icn in the IPL to
test for set equivalence.)

The problem that I see with such an approach is that, if N were the
maximum size of each set and M were the number of sets, then this search
would be O(M*N).
So, this approach doesn't scale well.

Is there a clever alternative for searching a group of sets for a "test" set?
It is not a requirement that the structure actually be an Icon set.
One issue that confounds my problem is that since the properties may not
become known in a particular order, I cannot see how to use a finite state
machine (e.g., construct a tree that I can walk until I find the node that
I am looking for or find that node to be missing and add it).

Any ideas?



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group

[Unicon-group] Venn's revenge - a data normalization problem

Reply via email to