-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
Behalf Of Paul Kyzivat
Sent: Tuesday, September 30, 2008 2:53 PM
To: [EMAIL PROTECTED]
Cc: [email protected]
Subject: Re: [Sip] Is body handling NP-complete?
Dale,
I like the thorough treatment. It seems plausible to me.
One implication of this is that body parts need to be
processed twice in a content-type+content-disposition+context
dependent way: once to extract references, and again for
final effect. (Hmm - is extracting of references dependent on
the disposition and context?)
If you imagine that various content types are handled by
plugins, then the plugins will each require two entry points
- one for each of these processing modes. While that is
certainly possible, it might mean that plugins that already
exist for non-sip applications (e.g. email or html) may not
be reusable for sip. (I am not familiar with the content-type
plugins for other protocols such as email. But I presume that
email plugins may be useful in some cases, at least for MESSAGE.)
Do plugins for email have this complexity?
An alternative is to do all the processing in one pass, and
then at the end check if all body parts have been processed.
BUT that means all the side effects of processing will have
been realized before discovering that the message is in
error. That is certainly not ideal.
This example also points out that the requirement that references
appear before the parts they reference is needed to make
the problem
tractable. Otherwise, there could be circular dependencies between
the choices for different multipart/alternatives, which might be
impossible to analyze by an algorithm as simple as a tree-walk.
This requirement is satisfied by lots of simple cases. But it
means that it is impossible to describe any kind of circular
structure. I'm not familiar enough with the use of
multipart/related to know if it is a restriction that is
tolerable there. Perhaps someone can comment on that.
Thanks,
Paul
[EMAIL PROTECTED] wrote:
Referring back to last March, in
http://www.ietf.org/mail-archive/web/sip/current/msg22426.html, I was
contemplating whether the rules for testing a message body required
searching through combinations of choices regarding which
alternatives
of multipart/alternatives are chosen. As I then conceptualized the
problem, the question was "Is there a choice of
alternatives such that
all the constraints for 'successful processing' can be
met?" It turns
out that if you phrase the question that way, the problem is
NP-complete, which means in practice that any algorithm to solve it
must do backtracking.
The new formulation of the problem attempts to avoid this
by dividing
the problem into three phases: (1) assemble a tree which
represents
the dependencies of super-parts' being processable upon that of
sub-parts, (2) walk the tree, tagging parts from the bottom
up as to
whether they are processable, (3) if the overall tree is
processable,
selecting for processing the correct subset of parts as directed by
the tags in the tree. As long as the size of the tree is
commensurate
with the number of body parts, the algorithm is inherently
linear, and
clearly it doesn't involve backtracking.
In regard to step (1), the tree is assembled:
- the root node represents the message body
- a node for a multipart/alternative part has one child for each
component
- a node for a multipart/mixed part has one child for each
component
- a note for a multipart/related part is not automatically given
children for its components
- a header in a part that refers to another component generates a
child of that part that represents the referred-to component
- similarly, a reference within a part that refers to another
component generates a child of that part that represents the
referred-to component (This is how multipart/related nodes get
children.)
Of course, if a component is referred to multiple times,
the component
is represented by multiple nodes in the tree, one node for
each of its
roles.
However, there are subtleties I might be overlooking, so I
am going to
revisit the embedding of the NP-complete problem "3-satisfiability"
into MIME to see what it reveals about my sketch of an algorithm.
Referring to the previous message:
The reduction is as follows:
http://en.wikipedia.org/wiki/NP-complete
http://en.wikipedia.org/wiki/Reduction_%28complexity%29
http://en.wikipedia.org/wiki/3-satisfiability#3-satisfiability
Choose an instance of the 3-satisfiability problem,
that is: a set of
boolean variables A, B, C, etc. and a formula composed of:
a series of "clauses" joined by AND, consisting of
a series of variables and their negations
joined by OR.
The problem is to determine if there is any assignment
of truth values
to the variables which makes the formula true, that is,
which makes
true one of the variables or negations in each of the clauses.
For example, choose 4 variables A, B, C, and D, and the formula
(A OR B OR C) AND (B OR NOT-C OR D) AND (NOT-A OR NOT-B
OR NOT-D)
This can be satisfied by setting A=TRUE, B=FALSE,
C=TRUE, and D=FALSE.
Construct a multipart/mixed with each component labeled
handling=required. Within that, for each variable,
have a part which
is multipart/alternate with 2 parts. If processing
chooses the first
part of the part corresponding to a variable, that will
model the
variable being TRUE. If processing chooses the second
part of the
part corresponding to a variable, that will model the
variable being
FALSE.
For example, our body starts:
multipart/mixed
multipart/alternative;handling=required
first part corresponding to A (not
yet specified)
second part corresponding to NOT-A
multipart/alternative;handling=required
first part corresponding to B
second part corresponding to NOT-B
multipart/alternative;handling=required
first part corresponding to C
second part corresponding to NOT-C
multipart/alternative;handling=required
first part corresponding to D
second part corresponding to NOT-D
For each clause of the formula, add an additional body
part, marked
"Content-Disposition: by-reference; handling=required".
Give the part
a 'cid' URL. The ability of the recipient to
successfully process
this part will model making the corresponding clause TRUE.
For example, our body is now:
multipart/mixed
multipart/alternative;handling=required
first part corresponding to A (not
yet specified)
second part corresponding to NOT-A
multipart/alternative;handling=required
first part corresponding to B
second part corresponding to NOT-B
multipart/alternative;handling=required
first part corresponding to C
second part corresponding to NOT-C
multipart/alternative;handling=required
first part corresponding to D
second part corresponding to NOT-D
part (not otherwise
specified);by-reference;handling=required;cid=clause1
part (not otherwise
specified);by-reference;handling=required;cid=clause2
part (not otherwise
specified);by-reference;handling=required;cid=clause3
We now want to arrange things so that one of the last 3
parts can be
successfully processed only if we choose for processing
one of the
earlier parts that corresponds to a variable value that
will make the
corresponding clause true. This is done by embedding forward
references in the variable-corresponding parts that refer to the
clause-corresponding parts.
For example:
1 multipart/mixed
2 multipart/alternative;handling=required
3 first part corresponding to A
references cid:clause1
4 second part corresponding to NOT-A
references cid:clause3
5 multipart/alternative;handling=required
6 first part corresponding to B
references cid:clause1
references cid:clause2
7 second part corresponding to NOT-B
references cid:clause3
8 multipart/alternative;handling=required
9 first part corresponding to C
references cid:clause1
10 second part corresponding to NOT-C
references cid:clause2
11 multipart/alternative;handling=required
12 first part corresponding to D
references cid:clause2
13 second part corresponding to NOT-D
references cid:clause3
14 part;by-reference;handling=required;cid=clause1
15 part;by-reference;handling=required;cid=clause2
16 part;by-reference;handling=required;cid=clause3
In order to process this body, the recipient must
choose exactly one
of each pair of bodies in the multipart/alternative
body parts. This
choice is equivalent to choosing a set of values for
the variables.
The references embedded in the chosen parts refer to
the body parts
that correspond to the clauses which are made true by the
corresponding variable's value. In order to
successfully process the
final body parts, the recipient must have chosen a set
of variable
values that makes all of the clauses true.
The resulting tree is:
1 -+- 2 --+- 3 ---- 14
| |
| +- 4 ---- 16
|
+- 5 --+- 6 --+- 14
| | |
| | +- 15
| |
| +- 7 ---- 16
|
+- 8 --+- 9 ---- 14
| |
| +- 10 --- 15
|
+- 11 -+- 12 --- 15
| |
| +- 13 --- 16
|
+- 14
|
+- 15
|
+- 16
Though the number of nodes is greater than the number of
body parts,
it's limited approximately by the number of appearances of
variables
in the original boolean formula, so it's linear in the
number of body
parts. So in building the tree, we don't have exponential effort.
In the new approach, we walk the tree, tagging each node as
to whether
it is processable. In the above model, we assume that all
the simple
parts are processable, so all nodes labeled 2 through 16 are marked
processable. And as we tag the nodes for processability, we choose
for each multipart/alternative node which child will be processed
(namely, the last child that is processable). In this tree, the
choices are 2 -> 4, 5 -> 7, 8 -> 10, and 11 -> 13.
Where things get interesting is deciding if node 1 is processable.
All of its children are processable, of course. But the
crux of the
construction is how the "by-reference;handling=required"
constraints
on nodes 14, 15, and 16 are handled. In this algorithm, we check
whether each part is referenced by a part that will be processed.
Fortunately, all references to these nodes must have been
tree-walked
before node 1 is examined, so can determine unambiguously
whether used
references point to those nodes. In this case, we have chosen to
process 2, 4, 5, 7, 8, 10, 11, and 13, which means that
there are used
references to 16, 16, 15, and 16. This leaves 17
unreferenced, and so
node 1 is determined to be unprocessable.
Note that the new algorithm avoids being NP-complete essentially by
making choices as soon as they are seen. In this model,
values NOT-A,
NOT-B, NOT-C, and NOT-D are chosen outright. Since the example
formula is not satisfied, the body that encodes it is not
processable.
This example also points out that the requirement that references
appear before the parts they reference is needed to make
the problem
tractable. Otherwise, there could be circular dependencies between
the choices for different multipart/alternatives, which might be
impossible to analyze by an algorithm as simple as a tree-walk.
Dale
_______________________________________________
Sip mailing list https://www.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol Use
[EMAIL PROTECTED] for questions on current sip Use
[EMAIL PROTECTED] for new developments on the application of sip
_______________________________________________
Sip mailing list https://www.ietf.org/mailman/listinfo/sip
This list is for NEW development of the core SIP Protocol Use
[EMAIL PROTECTED] for questions on current sip
Use [EMAIL PROTECTED] for new developments on the application of sip