Re: [Unicon-group] Venn's revenge - a data normalization problem

Clinton Jeffery Sun, 14 Sep 2008 10:52:16 -0700

I liked Hugh's answers (and questions) in reply to your question. Perhaps I
can
add a little value or a different perspective.


It would be easy to add access to Icon's built-in hash function, there is
probably
an IPL procedure which does the same thing, but Icon's built-in hash
function
is useless for sets, because sets are mutable.  If you insert a set A into a
set
of sets G, and then insert a new member into set A, how would hashing work?
Instead of hashing on the aggregate of the contents of structures, Icon and
Unicon
just hash on their reference (pointer) or serial #, and if a later set
happens to have
the same size and contents, hashing (and the member() function) do not
consider
them to match.

Some options: A) we could all get together and implement immutable sets (and
other
structure types) for which hashing on contents would be safe.  B) map your
sets down onto
another type such as strings or bit vectors which can be hashed.  C) get
real fancy, and
modify the existing set type to behave in some reasonable way when handling
elements
that are sets, or other structure types (but this would be hard to sell to
the community).
I guess in the past I have used Option B with reasonable success, but I am
not claiming
it will be cheap or trivial.  Others may have other good suggestions, like
Hugh did.

Cheers,
Clint

On Sun, Sep 14, 2008 at 4:19 AM, Art Eschenlauer <[EMAIL PROTECTED]> wrote:

> Is there a clever (i.e., cheap) approach for searching a group of sets for
> a "test" set?
> I suppose another way of asking this question is whether there is an
> Icon-like way of hashing sets (by using hashing that is built into the
> implementation, avoiding brute-force explicit computation of a hash in
> Icon code)?
>
> This is the essence a subproblem of a problem that I am working on (the
> problem happens to be inventorying versions of files on many computers,
> many of them similar).  Here is the long description of this subproblem:
>
> I want to maintain a group of properties related to instances of an entity.
>
> Many instances share exactly the same group of property values.
> There may be many instances of the entity, and the number of attributes is
> not known a priori.
> It may be (and frequently is) that there is a different number of
> properties applicable to different instances.
> Because there are so many instances, it is too expensive to maintain a
> list of all property values for each instance.
>
> Each property value consists of a property identifier and three values,
> which are grouped in an Icon record.
> (Unicon objects may be an acceptable alternative.)
> I collect the properties for a given instance of the entity in a
> non-deterministic order.
>
> What I would like to do is to save memory by associating a reference to a
> structure, so that each of the many instances with the same group of
> property values will only need to store a reference to the structure
> representing that group.
> To simplify the problem, I am assuming fewer than 256 attributes; however,
> I am assuming more than 256 groups.
>
> My thought was to "normalize" my data using an Icon set structure for each
> group of property value records.
> (I believe that a table structure could work just as well.)
> Each entity record would have a field referencing such a set.
> Each time that I learn a new property value for the instance represented
> by the entity record, I would go to a master list of sets, look for or
> construct a set consisting of the union of the newly acquired property
> value and the entity record's current set, and assign a reference to this
> other set to the entity record's properties field.
> "Look for" means constructing a set that is a union of the record's
> current set and the new property value and then testing each set in the
> master list for equivalence to the constructed set.
> (I am thinking of using the seteq procedure from sets.icn in the IPL to
> test for set equivalence.)
>
> The problem that I see with such an approach is that, if N were the
> maximum size of each set and M were the number of sets, then this search
> would be O(M*N).
> So, this approach doesn't scale well.
>
> Is there a clever alternative for searching a group of sets for a "test"
> set?
> It is not a requirement that the structure actually be an Icon set.
> One issue that confounds my problem is that since the properties may not
> become known in a particular order, I cannot see how to use a finite state
> machine (e.g., construct a tree that I can walk until I find the node that
> I am looking for or find that node to be missing and add it).
>
> Any ideas?
>
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's
> challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Unicon-group mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/unicon-group
>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/

_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group

Re: [Unicon-group] Venn's revenge - a data normalization problem

Reply via email to