On 27/01/2013 8:27 a.m., Tsantilas Christos wrote:
On 01/26/2013 06:05 AM, Amos Jeffries wrote:
On 26/01/2013 10:39 a.m., Tsantilas Christos wrote:
Hi all,

While we are working in a project we seen that the helpers do wrong
usage of the Notes interface.

This is maybe it is our mistake (and mostly my mistake) because we do
not have make good documentation of this interface.
In any case I believe that we should fix some thinks.

In the Notes.h file there are the following classes:

class Note
-----------

This is a class used to store a note configuration not a note it self.
It is a note name plus a list of possible values with an ACL list in
each value.
It is used to store configurations for the following cfg lines:
    note key value acl1 acl2


class Notes
------------
It is used to store a list of Notes. A list like the following:
    note key valueA acl1a acl2a
    note key valueB acl1b acl2b

This is used to add custom notes (or metaHeaders) in an HTTP request
With the ACL list being optional and the word "note" being dropped by
the config parser what is actually stored is:

  NotesList{
       Note{key,value,NULL},
       Note{key,value,NULL}
       ...
}
Well, not actually. What it is stored is the following:
  NoteList {
       Note {key, Values{{value1, NULL}, NULL} }
       Note {key, Values{{value1, NULL}, NULL} }
       ...
  }

A Note keep a note name plus a list of values/ACLs. The Acls used to
select the correct value for a Note.
It is designed to store configuration not a note pair.

Moreover to get the Value of a Note a new method added the
Notes::firstValue() to retrieve the first value of the list. It is not
so cool. It is a little confusing...

... which is clearly a list of kv-pairs without any specific information
about delimiter,  display semantics, or input source location. Nowhere
in ths implementation or documentation is it clear that this list
structure is *only* ever to be used for parsing the config file. Rather
the simplicity of the information stored implies strongly that it is
created as re-usable storage structure.
  ***As was requested when you were planning the Notes feature.***
Of course it can be used to store kv-pairs. What I am saying is that the
NotePairs class is better structure to keep this list.


class NotePairs
-----------------
This is an HttpHeader kid class  used to store the notes (or
MetaHeaders). We select HttpHeader class because it has some interesting
features. For example assume that two different subsystems add the same
note:
   X-SQUID-ProcessedBy: ReWriter
   X-SQUID-ProcessedBy: Adaptation

Then this is will be merged to
   X-SQUID-ProcessedBy: ReWriter, Adaptation
Yes NotePairs is clearly and specifically a Mime syntax version of
kv-pairs.

The interesting and useful fetures are *only* useful when outputting
kv-pairs as ICAP or HTTP headers in Mime format. Those features are not
useful in any other context Squid uses kv-pairs. ... which is why the
helpers avoid that type until the final logging stage where it has no
choice.
So we are agree that the NotePairs class is more useful to keep kv-pairs
at least for the ICAP/HTTP headers. But this is does not mean that it
can not be used for storing helpers kv-pairs.
It is just better to use the same structures for storing similar
information. It is better to just use NotePairs for storing notes,
because it can handle more cases (kv-pairs, ICAP/eCAP metaheaders and
maybe other type of notes in the future)

Also storing all notes in the same object maybe it is good policy
because  they can be used in other than the logging cases.

Currently all notes merged just before logging because they are used
only here.

Squid helper Notes problems
-------------------------------
The changes made in trunk revno:12490 and revno:12495

Problems I am seeing:
   - it uses class Notes to store the notes/metaHeaders not the NotePairs.
It adds the "Notes HelperReply:notes" and the "Notes
HttpRequest::helperNotes" members.
This is intentional. see above.

The first implementation attempt used a NotePairs list for helpers. The
exact same behaviour of merging keys together which make NotePair
wonderful for ICAP header display and logging display make this NotePair
type difficult to use as a kv-pair list for helpers and any other use
where HTTP field-value syntax is needed.
Why?

  The other difficulty was the whole or at least a large segment of the
HTTP parser becomes a dependency, which was not worth the cost and
effort for the simplicity of NotePairs. This was probably not noticable
when integrating with ICAP due to it already having that dependency in
place for ICAP header parsing.
You do not have to use any HTTP parser. You can just add a kv-pair (a
key/value pair), simply using the NotePairs::putExt(key, value).
Why the Notes::add(key, value) is better?


The need for multiple de-references ( A->value[X].value()->value ) when
accessing NotesList is accepted as a cost (for now) in exchange for the
benefit of *not* merging key values together and not having to link in
the HTTP parser.
I did not found such cases in the existing code.
What I am seeing is the following usage:
  Note::Pointer statusNote = reply.notes.find("status");
   ...
  const char * result = statusNote->firstValue();
  status = (http_status) atoi(result)
  ...
  urlNote = reply.notes.find("rewrite-url");
  if (urlNote != NULL && strcmp(urlNote->firstValue(), http->uri)
    ....

Which requires two local Pointers to access the value.
firstValue() is the accessor Alex had me add to hide the worst part of the de-referencing.

Without it we would have:

  reply.notes.find("status")->values[0]->value.termedBuf()

where it is preferrable to go:
   reply.notes.find("status")->firstValue();



The above if we use  NotePairs instead of Notes can be easily written as
follows:
  String status = reply.notes.getByName("status");
  status = (http_status) atoi(status.termedBuf())

Your example works for numeric values because ',' is not a valid digit and atoi() will abort before parsing the already parsed values.

It gets worse when handling URLs or any other value type which coudl ontain whitespace or ',' values internally:

1)
 helper reply:  OK url=http://example.com/ url=http://invalid.invalid/

translates to a single URL. String uri = reply.notes.getByName("uri"); will produce URL "http://example.com/, http://invalid.invalid/"; which the lenient URL parser (default behaviour) will see as "http://example.com/,%20http://invalid.invalid/";.

2)
helper reply: OK user="J, Smith" group="group1,A" group=group2 user=alias group=A

String group --> "group1,A, group2, group3" ... how does one determine that a group match is supposed to be equal to "group1,A" and not match "group1" or "A" individually, BUT must match "group2" or "B" individually?

String user ---> "J, Smith, alias" ... what username is this? it won' match any, and splitting on ',' will also likey not match any. But "J, Smith" could match.

3)
  helper reply:  OK tag=1 tag=something tag=magic

Will produce a single tag String "1, something, magic". What does the tag ACL type do?

Notice that all these are parsing the strings a second time just to interpret the way NotePairs has merged (using a memory copy probably) into a single String.

Things like this is where the "desired" behaviour of NotesPairs key merging is more of a problem to be worked around than a good thing.


I hope you can see that these are *not* desired behaviours. But they are an integral part of NotePairs due to its implementation using objects optimized for Mime header semantics. NotesList is not quite right either but offers cleaner access suitable for all these use cases.

If you know of an interface which can store and use the kv-pair from helpers in a way which the helpers kv-pairs need to be accessed I would be happy to audit the patch.


  ...
  status urlNote = reply.notes.getByName("rewrite-url");
  if (!urlNote.empty() && strcmp(urlNote->termedBuf(), http->uri)

Moreover we can add a member in NotePairs the
   const char *NotePairs::find(const char *key)
to simplify it more.

We could. Notice that this is already present in NotesList.

Also we have an interface in configured Notes to deny user to use some
names for notes which are reserved. We can add here the helper reserved
names like "status", "rewrite-url" etc.

They are not forbidden for use. They are in fact *expected* for the helper to use. The only significance a reserved key has in the helper protocol is that Squid performs special processing for them. With side effects that may or may not be wanted, but are up to the users choice.


Maybe we should require by the user to use always the X- prefix in his
kv-pairs keys. For example the "status" is a key which can be selected
by anyone. The user is better to use the "X-Status" as kv-pair key.

Which would make x-status the reserved key name and status key name mean nothing. Which brings us back to the problem of reserving X-Status ...

... "user" is not what you seem to think. "user" in this case is an automated helper. The only possible limits on key name are that it may not contain whitespace, CR, LF, \0, or '=' characters. *anything* else confirming to the key=["]value["] syntax is acceptible.

X-* Mime tokens are nothing special and due to Notes use in ICAP headers X-* is a potential overlap with reserved key names. For example; possibly an external_acl_type helper may be wanted to produce X-Client-Username:foo for relay to ICAP dynamically. The helpers interface defines '_' suffix as the reserved marker for custom local key names on all helper interfaces. This is intentionally specific to the helper interface and intentionally invalid Mime header character to prevent the above cases unexpectedly adding protocol injection vulnerabilities from helper supplied keys. Crossover between notes and ICAP/HTTP in future could still do injection if desirable and valid header names used as keys, but explicit use of _ for custom prevents mistakes causing problems.



  If you are solving problems in the Notes structure systems please
resolve that dereference issue.

   - It adds some extra methods which does not make sense and confuse more
the Notes interface:
         * NotePairs::append(const Notes::NotesList &src)
         * NotePairs::append(const NotePairs *src)
    These functions used to add a Notes list to a NotePairs List. The
HttpRequest::helperNotes to AccessLogEntry::notes.
By 'add' you mean 'append list S to this list'. Which is why it is
called 'append()'. Where is the problem? naming, semantics and operation
conform to standard conventions.
The problem is the semantics here.  A Notes list holds many informations
which there is not any way to add them in a NotePairs list. It holds a
list of "key plus a value/ACL list", not just a key/value pair list.
Only that.

Then it is not a NotesList it is a NotesPlusOtherList. Possibly this is part of the confusion. I was under the impression that we defined a list that could store annotations adde to the transaction.

However, I see no problem with using Note without ACL member set as a storage type for transaction annotations.



The first above converts a NotesList into a NotesPairs list.
Specifically because the NotePairs list is hard-coded to utilize the
MIME format key-value pairs with colon (:) delimiter - which in Squid
are stored in HttpHeader class types.
The HttpHeader holds a list of HttpHeaderEntry which consist my a name
and a value. It is a key/value pair.

Yes. Well it is a key:value-list set. Which if I am understanding you right is better stored in NotesList since NotesList stores multiple values per key name .... or not.

HttpHeader is a sub-type of kv-pair with protocol-specific semantics built into its accessor tools (such as thet values list merging you liked for ICAP usage). NotesList is a sub-type of kv-pair with generic access semantics built into its accessors. You see why I chose the one with simpler more-generic API for storing generic kv-pairs?

The NotesList implementation could be polished for better access, but that does not make it the wrong structure to use for storing a generic kv-pair set.


The second is added such that any existing AccessLogEntry list can be
appended to by other sources producing NotePairs lists (ie RESPMOD).

But It is easier to use NotePairs for HttpRequest::helperNotes member
and just use the (pre-)existing NotePairs::append (the
HttpHeaders::append)
Causing the need to covert a key=value string pair into an HttpHeader
object. Causing the need to define said HttpHeader as a custom header
and fill out its value set. All of which adds HTTP protocol operations
(and validity checks) for the sake of parsing the very non-HTTP input
"OK foo=hello".
If the HttpHeader is not a good structure to base the NotePairs  we
should use a different structure. But I still believe that the cost
adding a kv-pair is not so huge...

Also HttpHeader/HttpHeaderEntry uses mempools, which is a plus.

All the Note* objects should as well. This is something else to look at upgrading in them.



  Not to mention that "foo_=hello" (_ being defined for use by custom key
names) is an invalid HTTP header name rejected by the HttpHeader parser
infrastructure, and I have no intention of altering that HTTP-specific
parser (potentially affecting on-wire HTTP validity checks) for the sake
of unrelated code that does not even use HTTP or Mime syntax.
The HTTP headers parser will not be used. The foo_ will be used as key
name without any problem.
But even if there is problem using "_" in the kv-pair keys and it is so
important to use it in key names we should rewrite the NotePairs class
to support it.

AFAIK, all of the accessor helper functions for HttpHeader object type are used by some part of the HTTP parser. Their operational semantics are defined by RFC 2616 and the HTTPbis update Drafts. Using NotePairs as children of HttpHeader means either duplicating the accessors in a slightly different way, or altering these core pieces of the HTTP protocol parser. Neither of which is a good choice.

This is not a permanent fixed relationship though. The HttpHeader could be implemented as a child of NotePair with specialized accessors. Making a kvPair generic object type for kv-pairs, which is used for all key-values in use by Squid.




        * Notes::add(const String &noteKey, const String &noteValue);
        * Notes::add(const Notes &src);
        * Notes::find(const String &noteKey)
      To add and search for a note in Notes List. But it is wrong. We do
not store such Notes here. We are storing Notes configuration, while
parsing the cfg file.
So we have a linked list of key-value pairs writen with an generic
interface which is not allowed to be used anywhere outside of Parsing?
That goes against the design requirements I requested back at the
beginning. That Notes and NotesList be a generic list of kv-pairs we
could add to arbitrarily and output with various format delimiters.
Again read my comments above. The Note does not keep only a key/value
pair. Of course it can be used but it is not so normal, it is designed
to store other information ...

Not *only* no. However, the ACL field is optional so does not matter. Conceptually it is equivalent and has a better generic accessor set with less unwanted behavours.



We have four desired uses for NotesList in Squid:

  1) [A-Z]key ':' ' ' value - I/O for ICAP in Mime header format
  2) key '=' value - I/O for ICAP or HTTP in HTTP header field-value syntax
  3) key '=' value - helper I/O syntax
  4) key ' ' value - squid.conf 'notes' directive syntax

NP: (1) has limitations on key character set, and optional (desired 1
SP) whitespace after the ':' delimiter. The others do not.
Nor the ":" nor the space stored in HttpHeader list, only the Header
name and the header value.


We should use NotePairs for HttpRequest::helperNotes and
HelperReply::notes members and use the NotePairs::putExt or
NotePairs::addEntry(new HttpHeaderEntry(HDR_OTHER, name, value)); to
adding note/value pairs.

   - Initially we had select the AccessLogEntry to store notes. The
AccessLogEntry selected because it considered as good structure to hold
HTTP request X-data. (But we had problem on this).
I am not sure if the HttpRequest::helperNotes is the correct position to
add Notes. We should consider if we can add notes to
AccessLogEntry::notes, or select an other X-data structure.
HttpRequest is currently abused as a transport mechanism to relay the
notes through to AccessLogEntry, which is not directly accessible from
the helper handlers.
If ALE can be made accessible to both helpers and ACL match system in a
useful way the HttpRequest member can be removed.
We need to select which structure will/can play the role of the X-data.
Of course I agree we can hold all required information in HttpRequest
object.

NP: adding straight to the ALE list of NotePairs adds an extra problem,
that the kv-pair is now *copied* into Mime syntax. Once the note has
been converted to Mime header syntax it becomes much more difficult to
process the list and match a single kv-pair out of a set of identical
ones. ACLs for example do not have access to the ALE object to test
particular key values and even if they did they can only retrieve the
entire list of all values associated with each key name.

I am working on changing the above problems. Also maybe we need to
change the names of some classes to avoid confusion in the future, for
example rename Note and Notes to NoteRule and NoteRules or similar.

Any opinions on the above?
When writing the helper interface use of Notes the problem was only that
Notes* classes were hard-coded for a specific use-case output.

We had long discussions during the design phase and auditing about using
Notes as both header format "Key:value" and field format "key=value" - I
was under the impression that you had taken those usage requirements on
board and designed Note class which clearly takes a Key and Value
separately (without any additional details like ACLs!).
We may had select wrong name for Note and Notes class. But this is the
only problem. Please see my comments above, and my first mail of this
thread which explains the classes.



Other than the link with ACLs, class Note is fine for use as a generic
kv-pair node.
Other than your documentation of it, class NotesList is reasonably okay
for use as a generic list of Note kv-pair.  One problem is the use of
std::list<Node::Pointer> which causes a double-indirection layer when
trying to access any node position of the list.
  For example:  list->values[0]->value().keyValue()   instead of just
list[0]->keyValue().


To resolve what you see as an API abuse "problem" I think all that is
needed:
  1) make Note the base type representing just a kv-pair
  2) make a sub-class associating ACLs as a config specific sub-type of it.
  2) make a "Packer" for NotesList that outputs in each of the earlier
mentioned syntax formats.



To resolve the indirection problems in NotesList I propose adding a
template to Squid sources which presents the *next pointer and
linked-list accessors and iterators. Such that any class can inherit
from it to be a node in a linked-list.
  We can then unify all the classes and structures which are performing
their own linked-list implementations into inheritence from this template.
  - NotesList would then be a class inheriting from ListNode<> and Note.

Sound reasonable?
Nope.
I still believe that the best is something like the following:
  - Rename Note and Notes to NoteRule and NoteRules, to avoid confusion
in the future.

To be clearly a config-specific object type it needs to have Config or Cfg clearly in its naming.

However for what they store this would be a bit excessive. Why define a global NotesList type if it were 100% specific to storing a directives configuration details. Define a global instance and a NotesConfig which stores a list of directive NotePlusAclList objects one for each config file line.


  - Maybe rename NotePairs to Notes, for the same reason.
  - Use NotePairs for HttpReply::notes and HttpRequest::helperNotes members
  - If the HttpHeaders is not a good choice as base class of NotePairs we
should re-implement the NotePairs to fit our needs.  But I still believe
that it is not so bad ...

It would possibly not be bad if it were re-implemented.

Amos

Reply via email to