Hello all

After several months of development, I have merged the new data support subsystem for ORTE into the trunk. I must provide one caveat of warning: I have made every effort to test the revised system, but cannot guarantee its operation in every condition and under every system. For one, I don't have access to every type of system to which ORTE/OMPI has been ported...and, to be honest, the trunk moves so quickly that I would never get this merged if I keep chasing the latest trunk version. Hence, you may see some degree of instability - hopefully, this will be minimal or non-existent, but it could happen.

Those of you primarily interested in the MPI layer need read no further unless you intend to use any of the ORTE data types. For everyone else, please read on.

The primary changes in this revision were:

1. redefinition of several key data types, including the orte_data_value_t, orte_gpr_value_t, and orte_gpr_keyval_t structures. This was done in order to eliminate ALL knowledge of data types from the registry - the registry now has no knowledge of what is being stored. This allowed the second change...

2. completely localize all data type functionality. In the prior version, a developer who changed a data type definition (e.g., adding an element to a defined structure) was required to make corresponding changes to functions that copied, deleted, compared, and printed the data type in a number of places. In particular, this was required in at least three locations within the registry subsystem! This level of complexity caused a number of errors to occur, driven by someone changing a structure and not catching the necessary changes everywhere else. This resulted in unstable behavior that was very hard to debug and fix.

The new data support subsystem resolves this problem by requiring the definer of a data type to provide several key functions:

a. compare - how to compare two instances of the data type, providing a value of equal, value 1 greater, or value 2 greater. These three outputs are now defined values to ensure compatibility throughout the code base - please USE THEM.

b. copy - how to copy one instance into a new data location, allocating memory dynamically to provide the necessary storage

c. print - method to pretty-print the contents of the data type, essential for debugging and/or use by the registry "dump" functions

d. size - method to compute the size of the specified data type instance, including the size of any non-static fields (e.g., a string variable)

e. release - method for releasing a dynamically-allocated instance of the data type. In most cases, this function either does a free or an OBJ_RELEASE, but it could be used (for example) to provide a debugging version of a release function

f. pack/unpack - how to pack/unpack an instance into an ORTE buffer for transmission

In addition, the data type definition requires that two values be provided:

a. boolean flag indicating whether the data type is structured or not. This was provided in addition to the release function to allow a developer to (for example) define a debugging release independent of the "flavor" (i.e., structured or not) of the data type

b. a name for the data type. This is required to be unique.

All of these functions have been provided for the "standard" data types (ints, bool, etc.), so you don't have to worry about those. For an example of these functions, you can look either at the orte/dss functions (where the standard data types are supported) or at the orte/mca/gpr/base/data_type_support directory where more complex types are defined. The orte/dss/dss_open_close.c and orte/mca/gpr/base/gpr_base_open.c functions include the data type registration calls. I have also provided the functions for all of the current orte defined data types.

Two other functional entries (set and get) to the data support subsystem were created that are intended to mimic true object-oriented programming for the orte_data_value_t object. There are times in the code where it is more convenient to work with statically-defined variables. Using the "copy" function, however, to move data from one object to another causes memory to be dynamically allocated. The set/get functions provide a "safe" method for doing this statically.

In addition to changing the data type definitions, two "helper" functions were created to support the gpr_value and gpr_keyval structures. In working through the code, I found a number of instances where people had forgotten to completely define these structures, leaving some fields unintentionally "blank". This appeared to cause problems at times, and definitely caused headaches when making this transition. In addition, there was a lot of duplicative and painful code due to all the error checking required while building one of these structures.

To simplify things, I created two new gpr API functions: create_value and create_keyval. Each of these takes as arguments the values to be placed in their respective fields, and will return to you a fully built structure with all the desired error checking for memory availability etc. Using these functions will also protect you against any future changes to the system. The only negative is that these functions dynamically allocate the required memory.

I hope that helps to explain the changes. As you can see from the commit, this hit a large number of functions. I have provided unit tests for all the data types within the revised data support system that help illustrate how that system is used. In particular, you can look at test/dss and at test/mca/gpr (the gpr_dt_xxx functions) for examples.

Please feel free to holler with questions - and do please let me know if you find any problems with the revisions.
Ralph


Reply via email to