On 12/09/2013 04:13 PM, Amos Jeffries wrote:

> Two requests for additional scope:

> * can we place this is a separate src/parse/ library please?
>  - we have other generic parse code the deserves to all be bundled up
> together instead of spread out. Might as well start that collection
> process now.

No objections.


> * Lets do the charset boolean array earlier rather than later.

No objections.


I hope I will not be the one implementing the proposed API or the above
additions though. :-)


>   CharacterSet(const char * const c, size_t len)

You do not want the len argument. These character sets will be nearly
always initialized once, from constant hard-coded c-strings.

For esoteric cases, I would add an add(const char c) method to add a
single character to the set (and use the merge operation to produce more
elaborate sets as needed, see below for a sketch).


It is a good idea to add a public label/name argument/member though, for
debugging/triaging:

  CharacterSet(const char *label, const char *characters);
  ...
  const char *name; ///< a label for debugging


>   /// add all characters from the given CharacterSet to this one
>   void merge(const CharacterSet &src) const {

Please provide a '+=' operator [that will use merge()]. Here is how I
envision some sets might be created:

const CharacterSet &MySpaces() {
    static CharacterSet cs;
    if (!cs.name) {
        cs = CharacterSet("MySpaces", "\n\t "); // RFC 12345
        cs += OtherProtocolSpaces();
        if (Config.weNeedToBeSpecial)
            cs.add('\r');
    }
    return cs;
}


> private:
>   bool match_[256];

I suggest using std::vector<bool> to avoid sprinkling the code with 256
constants or adding another member to store the hard-coded length.

I would also call this member chars_ instead of match_ because some sets
will be used in negative sense and "match" will be a little confusing in
that negative context.


> NP: most of the time we will be wanting to define these CharacterSet as
> global once-off objects. So I'm not sure if the merge() method is
> useful, but shown here for completeness in case we want it for
> generating composite character sets.

Agreed on all counts.


Cheers,

Alex.

Reply via email to