[O-MPI devel] New data support subsystem for ORTE

Ralph H. Castain Mon, 6 Feb 2006 23:08:59 -0500

Hello all

After several months of development, I have merged the new datasupport subsystem for ORTE into the trunk. I must provide one caveatof warning: I have made every effort to test the revised system, butcannot guarantee its operation in every condition and under everysystem. For one, I don't have access to every type of system to whichORTE/OMPI has been ported...and, to be honest, the trunk moves soquickly that I would never get this merged if I keep chasing thelatest trunk version. Hence, you may see some degree of instability -hopefully, this will be minimal or non-existent, but it could happen.

Those of you primarily interested in the MPI layer need read nofurther unless you intend to use any of the ORTE data types. Foreveryone else, please read on.


The primary changes in this revision were:

1. redefinition of several key data types, including theorte_data_value_t, orte_gpr_value_t, and orte_gpr_keyval_tstructures. This was done in order to eliminate ALL knowledge of datatypes from the registry - the registry now has no knowledge of whatis being stored. This allowed the second change...

2. completely localize all data type functionality. In the priorversion, a developer who changed a data type definition (e.g., addingan element to a defined structure) was required to make correspondingchanges to functions that copied, deleted, compared, and printed thedata type in a number of places. In particular, this was required inat least three locations within the registry subsystem! This level ofcomplexity caused a number of errors to occur, driven by someonechanging a structure and not catching the necessary changeseverywhere else. This resulted in unstable behavior that was veryhard to debug and fix.

The new data support subsystem resolves this problem by requiring thedefiner of a data type to provide several key functions:

a. compare - how to compare two instances of the data type, providinga value of equal, value 1 greater, or value 2 greater. These threeoutputs are now defined values to ensure compatibility throughout thecode base - please USE THEM.

b. copy - how to copy one instance into a new data location,allocating memory dynamically to provide the necessary storage

c. print - method to pretty-print the contents of the data type,essential for debugging and/or use by the registry "dump" functions

d. size - method to compute the size of the specified data typeinstance, including the size of any non-static fields (e.g., a string variable)

e. release - method for releasing a dynamically-allocated instance ofthe data type. In most cases, this function either does a free or anOBJ_RELEASE, but it could be used (for example) to provide adebugging version of a release function

f. pack/unpack - how to pack/unpack an instance into an ORTE bufferfor transmission


In addition, the data type definition requires that two values be provided:

a. boolean flag indicating whether the data type is structured ornot. This was provided in addition to the release function to allow adeveloper to (for example) define a debugging release independent ofthe "flavor" (i.e., structured or not) of the data type


b. a name for the data type. This is required to be unique.

All of these functions have been provided for the "standard" datatypes (ints, bool, etc.), so you don't have to worry about those. Foran example of these functions, you can look either at the orte/dssfunctions (where the standard data types are supported) or at theorte/mca/gpr/base/data_type_support directory where more complextypes are defined. The orte/dss/dss_open_close.c andorte/mca/gpr/base/gpr_base_open.c functions include the data typeregistration calls. I have also provided the functions for all of thecurrent orte defined data types.

Two other functional entries (set and get) to the data supportsubsystem were created that are intended to mimic trueobject-oriented programming for the orte_data_value_t object. Thereare times in the code where it is more convenient to work withstatically-defined variables. Using the "copy" function, however, tomove data from one object to another causes memory to be dynamicallyallocated. The set/get functions provide a "safe" method for doingthis statically.

In addition to changing the data type definitions, two "helper"functions were created to support the gpr_value and gpr_keyvalstructures. In working through the code, I found a number ofinstances where people had forgotten to completely define thesestructures, leaving some fields unintentionally "blank". Thisappeared to cause problems at times, and definitely caused headacheswhen making this transition. In addition, there was a lot ofduplicative and painful code due to all the error checking requiredwhile building one of these structures.

To simplify things, I created two new gpr API functions: create_valueand create_keyval. Each of these takes as arguments the values to beplaced in their respective fields, and will return to you a fullybuilt structure with all the desired error checking for memoryavailability etc. Using these functions will also protect you againstany future changes to the system. The only negative is that thesefunctions dynamically allocate the required memory.

I hope that helps to explain the changes. As you can see from thecommit, this hit a large number of functions. I have provided unittests for all the data types within the revised data support systemthat help illustrate how that system is used. In particular, you canlook at test/dss and at test/mca/gpr (the gpr_dt_xxx functions) for examples.

Please feel free to holler with questions - and do please let me knowif you find any problems with the revisions.

Ralph

[O-MPI devel] New data support subsystem for ORTE

Reply via email to