We need to be able to validate DTD and XML Schema datatypes.
I consider the DTD datatypes core and essential because they
are defined as part of the XML syntax in the XML 1.0 spec.
The XML Schema datatypes are defined for use with XML Schema
structures but have also been borrowed by other schema 
languages (e.g. RelaxNG).

So I will develop an interface for validating datatype values
in stages. First, let's start with the absolute simplest
interface:

  public interface DatatypeValidator {
    public boolean validate(String content);
  }

The interface has a single method that returns true or
false depending on whether the content is valid or not.
This will do for most datatypes but what about types that
require some extra knowledge, or "state"? I'm talking about
ENTITY, ID/IDREF, and NOTATION. Each needs to be able to
query (and sometimes store) additional information.

It would NOT be a good idea to have the datatypes explicitly
know about the validator because: 1) it would introduce an
unneeded dependency; and 2) it would mean that the library
is not standalone. So let's define some "state" interfaces
so that these special datatypes can retrieve the information
they need in a general way:

  public interface EntityState {
    public boolean isUnparsedEntity(String name);
  }

  public interface IdState {
    public boolean isDeclaredId(String name);
    public void addId(String name);
  }

  public interface IdRefState {
    public void addIdRef(String name);
  }

  public interface NotationState {
    public boolean isNotation(String name);
  }

It would be the responsibilty of whoever calls the datatype
validators to provide implementations of these interfaces.
I won't go into the details now, though, because it is just
an implementation detail. Let's concentrate on the interfaces 
for the time being.

We now have interfaces defining the state for the special 
datatypes so that they can do their job. But how do the 
datatypes "know" about the state? We could add a method to 
the interface but this would prevent the validator instance 
objects from being shared. So the only way to do this is to 
pass the state object to the validate method. This changes 
our datatype validator interface.

  public interface DatatypeValidator {
    public boolean validate(String content, Object state);
  }

[Q] Should the "state" interfaces be defined as inner
    interfaces of DatatypeValidator?

Since the state interfaces are related to the "core"
datatypes and only used within that context, I think that 
it would be a good idea.

Okay, we have a basic interface that works for all of the 
datatypes but there's other things we want to support. For 
example, access to the native type objects. To do this, we 
could have the validate method return an Object reference 
to the native type. So if the datatype primitive was 
integer, the method could return an Integer object. And if 
the datatype was defined as NMTOKENS, the method could 
return an array of String objects.

But... now we need a way to signal that the content was 
valid or invalid. There are two ways to do this. First, a 
null value could indicate that the value was invalid but 
in the case of a list of values (returned as an array), 
you wouldn't be able to tell which value was bad. For this 
reason alone I would rather define an exception for 
invalid values that is thrown from the method.

  public interface DatatypeValidator {
    public Object validate(String content, Object state)
      throws InvalidDatatypeValueException;
  }

  public class InvalidDatatypeValueException extends Exception {
    public InvalidDatatypeValueException(String value);
  }

[Q] What about value normalization?

This is a serious matter but I'm torn on whether datatype 
value normalization (e.g. removing spans of whitespace 
between values in a list) sould be done within the 
validator instance. [Perhaps we can discuss this as a 
separate sub-thread of this design discussion.]

Comments, as always, are welcome.

-- 
Andy Clark * IBM, TRL - Japan * [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to