I think I would prefer a declarative specification of body handling as the normative part. But if we have a good processing algorithm to show, I would like to see it included, perhaps as a non-normative appendix. Its a good demonstration that the declarative spec is self consistent, and it gives developers a good starting point.

        Thanks,
        Paul

DRAGE, Keith (Keith) wrote:
I would note that if this does end up in the specification, it can carry
a rider that implementations should exhibit behaviour as demonstrated by
the algorithm, not that they need to implement the algorithm itself.
However I leave it to the working group to ascertain whether they prefer
this as a means of specification.

Regards

Keith

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Paul Kyzivat
Sent: Tuesday, September 30, 2008 2:53 PM
To: [EMAIL PROTECTED]
Cc: [email protected]
Subject: Re: [Sip] Is body handling NP-complete?

Dale,

I like the thorough treatment. It seems plausible to me.

One implication of this is that body parts need to be processed twice in a content-type+content-disposition+context dependent way: once to extract references, and again for final effect. (Hmm - is extracting of references dependent on the disposition and context?)

If you imagine that various content types are handled by plugins, then the plugins will each require two entry points - one for each of these processing modes. While that is certainly possible, it might mean that plugins that already exist for non-sip applications (e.g. email or html) may not be reusable for sip. (I am not familiar with the content-type plugins for other protocols such as email. But I presume that email plugins may be useful in some cases, at least for MESSAGE.)

Do plugins for email have this complexity?

An alternative is to do all the processing in one pass, and then at the end check if all body parts have been processed. BUT that means all the side effects of processing will have been realized before discovering that the message is in error. That is certainly not ideal.

This example also points out that the requirement that references appear before the parts they reference is needed to make
the problem
tractable. Otherwise, there could be circular dependencies between the choices for different multipart/alternatives, which might be impossible to analyze by an algorithm as simple as a tree-walk.
This requirement is satisfied by lots of simple cases. But it means that it is impossible to describe any kind of circular structure. I'm not familiar enough with the use of multipart/related to know if it is a restriction that is tolerable there. Perhaps someone can comment on that.

        Thanks,
        Paul

[EMAIL PROTECTED] wrote:
Referring back to last March, in

http://www.ietf.org/mail-archive/web/sip/current/msg22426.html, I was
contemplating whether the rules for testing a message body required searching through combinations of choices regarding which
alternatives
of multipart/alternatives are chosen. As I then conceptualized the problem, the question was "Is there a choice of
alternatives such that
all the constraints for 'successful processing' can be
met?" It turns
out that if you phrase the question that way, the problem is NP-complete, which means in practice that any algorithm to solve it must do backtracking.

The new formulation of the problem attempts to avoid this
by dividing
the problem into three phases: (1) assemble a tree which
represents
the dependencies of super-parts' being processable upon that of sub-parts, (2) walk the tree, tagging parts from the bottom
up as to
whether they are processable, (3) if the overall tree is
processable,
selecting for processing the correct subset of parts as directed by the tags in the tree. As long as the size of the tree is
commensurate
with the number of body parts, the algorithm is inherently
linear, and
clearly it doesn't involve backtracking.

In regard to step (1), the tree is assembled:

- the root node represents the message body
- a node for a multipart/alternative part has one child for each
  component
- a node for a multipart/mixed part has one child for each
  component
- a note for a multipart/related part is not automatically given
  children for its components
- a header in a part that refers to another component generates a
  child of that part that represents the referred-to component
- similarly, a reference within a part that refers to another
  component generates a child of that part that represents the
  referred-to component (This is how multipart/related nodes get
  children.)

Of course, if a component is referred to multiple times,
the component
is represented by multiple nodes in the tree, one node for
each of its
roles.

However, there are subtleties I might be overlooking, so I
am going to
revisit the embedding of the NP-complete problem "3-satisfiability"
into MIME to see what it reveals about my sketch of an algorithm.

Referring to the previous message:

    The reduction is as follows:

    http://en.wikipedia.org/wiki/NP-complete
    http://en.wikipedia.org/wiki/Reduction_%28complexity%29
    http://en.wikipedia.org/wiki/3-satisfiability#3-satisfiability

Choose an instance of the 3-satisfiability problem,
that is: a set of
    boolean variables A, B, C, etc. and a formula composed of:
            a series of "clauses" joined by AND, consisting of
a series of variables and their negations
joined by OR.
The problem is to determine if there is any assignment
of truth values
to the variables which makes the formula true, that is,
which makes
    true one of the variables or negations in each of the clauses.

    For example, choose 4 variables A, B, C, and D, and the formula

(A OR B OR C) AND (B OR NOT-C OR D) AND (NOT-A OR NOT-B
OR NOT-D)
This can be satisfied by setting A=TRUE, B=FALSE,
C=TRUE, and D=FALSE.
    Construct a multipart/mixed with each component labeled
handling=required. Within that, for each variable,
have a part which
is multipart/alternate with 2 parts. If processing
chooses the first
part of the part corresponding to a variable, that will
model the
variable being TRUE. If processing chooses the second
part of the
part corresponding to a variable, that will model the
variable being
    FALSE.

    For example, our body starts:

            multipart/mixed
                    multipart/alternative;handling=required
first part corresponding to A (not
yet specified)
                            second part corresponding to NOT-A
                    multipart/alternative;handling=required
                            first part corresponding to B
                            second part corresponding to NOT-B
                    multipart/alternative;handling=required
                            first part corresponding to C
                            second part corresponding to NOT-C
                    multipart/alternative;handling=required
                            first part corresponding to D
                            second part corresponding to NOT-D

For each clause of the formula, add an additional body
part, marked
"Content-Disposition: by-reference; handling=required".
 Give the part
a 'cid' URL. The ability of the recipient to
successfully process
    this part will model making the corresponding clause TRUE.

    For example, our body is now:

            multipart/mixed
                    multipart/alternative;handling=required
first part corresponding to A (not
yet specified)
                            second part corresponding to NOT-A
                    multipart/alternative;handling=required
                            first part corresponding to B
                            second part corresponding to NOT-B
                    multipart/alternative;handling=required
                            first part corresponding to C
                            second part corresponding to NOT-C
                    multipart/alternative;handling=required
                            first part corresponding to D
                            second part corresponding to NOT-D
part (not otherwise
specified);by-reference;handling=required;cid=clause1
part (not otherwise
specified);by-reference;handling=required;cid=clause2
part (not otherwise specified);by-reference;handling=required;cid=clause3

We now want to arrange things so that one of the last 3
parts can be
successfully processed only if we choose for processing
one of the
earlier parts that corresponds to a variable value that
will make the
    corresponding clause true.  This is done by embedding forward
    references in the variable-corresponding parts that refer to the
    clause-corresponding parts.

    For example:

 1          multipart/mixed
 2                  multipart/alternative;handling=required
 3                          first part corresponding to A
                                    references cid:clause1
 4                          second part corresponding to NOT-A
                                    references cid:clause3
 5                  multipart/alternative;handling=required
 6                          first part corresponding to B
                                    references cid:clause1
                                    references cid:clause2
 7                          second part corresponding to NOT-B
                                    references cid:clause3
 8                  multipart/alternative;handling=required
 9                          first part corresponding to C
                                    references cid:clause1
10                          second part corresponding to NOT-C
                                    references cid:clause2
11                  multipart/alternative;handling=required
12                          first part corresponding to D
                                    references cid:clause2
13                          second part corresponding to NOT-D
                                    references cid:clause3
14                  part;by-reference;handling=required;cid=clause1
15                  part;by-reference;handling=required;cid=clause2
16                  part;by-reference;handling=required;cid=clause3

In order to process this body, the recipient must
choose exactly one
of each pair of bodies in the multipart/alternative
body parts.  This
choice is equivalent to choosing a set of values for
the variables.
The references embedded in the chosen parts refer to
the body parts
    that correspond to the clauses which are made true by the
corresponding variable's value. In order to
successfully process the
final body parts, the recipient must have chosen a set
of variable
    values that makes all of the clauses true.

The resulting tree is:

        1 -+- 2 --+- 3 ---- 14
           |      |
           |      +- 4 ---- 16
           |
           +- 5 --+- 6 --+- 14
           |      |      |
           |      |      +- 15
           |      |
           |      +- 7 ---- 16
           |
           +- 8 --+- 9 ---- 14
           |      |
           |      +- 10 --- 15
           |
           +- 11 -+- 12 --- 15
           |      |
           |      +- 13 --- 16
           |
           +- 14
           |
           +- 15
           |
           +- 16

Though the number of nodes is greater than the number of
body parts,
it's limited approximately by the number of appearances of
variables
in the original boolean formula, so it's linear in the
number of body
parts.  So in building the tree, we don't have exponential effort.

In the new approach, we walk the tree, tagging each node as
to whether
it is processable. In the above model, we assume that all
the simple
parts are processable, so all nodes labeled 2 through 16 are marked processable. And as we tag the nodes for processability, we choose for each multipart/alternative node which child will be processed (namely, the last child that is processable). In this tree, the choices are 2 -> 4, 5 -> 7, 8 -> 10, and 11 -> 13.

Where things get interesting is deciding if node 1 is processable.
All of its children are processable, of course. But the
crux of the
construction is how the "by-reference;handling=required"
constraints
on nodes 14, 15, and 16 are handled. In this algorithm, we check whether each part is referenced by a part that will be processed. Fortunately, all references to these nodes must have been
tree-walked
before node 1 is examined, so can determine unambiguously
whether used
references point to those nodes. In this case, we have chosen to process 2, 4, 5, 7, 8, 10, 11, and 13, which means that
there are used
references to 16, 16, 15, and 16. This leaves 17
unreferenced, and so
node 1 is determined to be unprocessable.

Note that the new algorithm avoids being NP-complete essentially by making choices as soon as they are seen. In this model,
values NOT-A,
NOT-B, NOT-C, and NOT-D are chosen outright. Since the example formula is not satisfied, the body that encodes it is not
processable.
This example also points out that the requirement that references appear before the parts they reference is needed to make
the problem
tractable. Otherwise, there could be circular dependencies between the choices for different multipart/alternatives, which might be impossible to analyze by an algorithm as simple as a tree-walk.

Dale
_______________________________________________
Sip mailing list  https://www.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol Use [EMAIL PROTECTED] for questions on current sip Use [EMAIL PROTECTED] for new developments on the application of sip

_______________________________________________
Sip mailing list  https://www.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol Use [EMAIL PROTECTED] for questions on current sip Use [EMAIL PROTECTED] for new developments on the application of sip

_______________________________________________
Sip mailing list  https://www.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use [EMAIL PROTECTED] for questions on current sip
Use [EMAIL PROTECTED] for new developments on the application of sip

_______________________________________________
Sip mailing list  https://www.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol
Use [EMAIL PROTECTED] for questions on current sip
Use [EMAIL PROTECTED] for new developments on the application of sip

Reply via email to