RE: Directory Studio: Backslash in DN breaks studio

Pepersack, Bob G Fri, 11 Mar 2016 05:19:10 -0800

Is there a configuration where I can set my keystore settings with a text 
editor, or some other kind of editor?

When I build the project from "directory-studio-trunk.zip" with "mvn clean 
install -Dmaven.test.skip=true", it throws this exception:

[INFO] Scanning for projects...
[ERROR] Internal error: 
org.eclipse.tycho.core.osgitools.OsgiManifestParserException: Exception parsing 
OSGi MANIFEST 
C:\bit9prog\dev\Installation\directory-studio-trunk\plugins\aciitemeditor\META-INF\MANIFEST.MF:
 Manifest file not found -> [Help 1]
org.apache.maven.InternalErrorException: Internal error: 
org.eclipse.tycho.core.osgitools.OsgiManifestParserException: Exception parsing 
OSGi MANIFEST C:\bit9prog\dev\Installation\directory-studio-tru
nk\plugins\aciitemeditor\META-INF\MANIFEST.MF: Manifest file not found
        at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:166)
        at org.apache.maven.cli.MavenCli.execute(MavenCli.java:582)
        at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:214)
        at org.apache.maven.cli.MavenCli.main(MavenCli.java:158)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
        at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
        at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
        at 
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
Caused by: org.eclipse.tycho.core.osgitools.OsgiManifestParserException: 
Exception parsing OSGi MANIFEST 
C:\bit9prog\dev\Installation\directory-studio-trunk\plugins\aciitemeditor\META-INF\MANIFEST.MF:
 Manifest file not found
        at 
org.eclipse.tycho.core.osgitools.DefaultBundleReader.loadManifestFromDirectory(DefaultBundleReader.java:95)
        at 
org.eclipse.tycho.core.osgitools.DefaultBundleReader.doLoadManifest(DefaultBundleReader.java:59)
        at 
org.eclipse.tycho.core.osgitools.DefaultBundleReader.loadManifest(DefaultBundleReader.java:50)
        at 
org.eclipse.tycho.core.osgitools.OsgiBundleProject.readArtifactKey(OsgiBundleProject.java:147)
        at 
org.eclipse.tycho.core.osgitools.OsgiBundleProject.setupProject(OsgiBundleProject.java:142)
        at 
org.eclipse.tycho.core.resolver.DefaultTychoResolver.setupProject(DefaultTychoResolver.java:74)
        at 
org.eclipse.tycho.core.maven.TychoMavenLifecycleParticipant.afterProjectsRead(TychoMavenLifecycleParticipant.java:90)
        at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:310)
        at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:154)
        ... 11 more

-----Original Message-----
From: Emmanuel Lécharny [mailto:[email protected]] 
Sent: Thursday, March 10, 2016 7:18 PM
To: [email protected]; [email protected]
Subject: Re: Directory Studio: Backslash in DN breaks studio

Le 10/03/16 22:58, Stefan Seelmann a écrit :
> On 03/09/2016 07:59 PM, Emmanuel Lécharny wrote:
>> Le 09/03/16 18:54, Philip Peake a écrit :
>> Can you be a bit more explicit ?
>>
> Probably same cause as in
> https://issues.apache.org/jira/browse/DIRSTUDIO-1087 and
> https://issues.apache.org/jira/browse/DIRSERVER-2109
>
I took some time last week-end to re-think the whole problem. There are a lot 
of things we are doing wrong, IMO. Don't get me wrong though :
most of the time, it simply works.

FTR, I send this mail to the dev list, copying it to the users list.

<this is going to be a long mail...>

First of all, we need to distinguish the clients from the server. They are to 
different beasts, and we should assume the server *always* receive data that 
are potentially harmful and incorrect.

Then we also need to distinguish String values and Binary values. The reason we 
make this distinction is that String values are going to be encoded in UTF-8 
thus using multi-bytes, and also because we need to convert them from UTF-8 to 
Unicode (and back).

Let's put aside the binary data at the moment.

The server
==========

Value
-----

We receive UTF-8 Strings, we convert them to Unicode and now we can process 
them in Java. We do need this conversion because we need to check the values 
before injecting them in the backend. Doing such checks in UTF-8 would be very 
impracticable.

There is one critical operation that is done on values when we process them : 
we most of the time need to compare them to another value :
typically, when we have an index associated with this value, or when we have a 
search filter. Comparing two values is not as simple as doing an lexicographic 
comparison sadly. We need to 'prepare' the values accordingly to some very 
specific rules, and we should also 'normalize'
those values accordingly to some syntax.

A comparison is done following this process :

Val 1 -> normalization -> preparation-+
                                       \
                                        .--> Comparison
                                       / Val 2 -> normalization -> preparation-+

We can save some processing if one of the two values has already been 
normalized or prepared. Actually, we should do that only once for each value : 
when they are injected into the server for the first time. But doing so would 
also induce some constraint : disk usage (saving many forms of a data cost 
space, and time when it comes to read them from disk. This is all about 
balance...).

Anyway, most of the time, we get a value and we just need to store it into the 
backend after having checked its syntax. And that's the key :
checking the syntax requires some preparation. Here is how we proceed when we 
just need to chck teh syntax :

Value --> normalization --> syntax check

There is no string preparation.

The normalization is specific to each AttributeType. The String Preparation is 
the same for all the values.

Now, there are two specific use cases : filters, and DN.

Filter
------

A filter always contains a String that needs to be processed to give a tuple : 
<attributeType, value>. There are rules that must be applied to transform the 
incoming filter to this tuple. Once we have created this tuple, we can 
normalize and prepare the tuple's value : something that might be complex, 
especially when dealing with substring matches.

So for filter, the process is :

fliter -> preProcessing -> Tuple<AttributeType, Value> -> normalization
-> preparation

The String preparation is required because the filter's value will be compared 
with what we fetch from the backend.

DN
--

The DN is not a String. It's a list of RDN, where each RDN is a list of AVA, 
where each AVA is a tuple <attributeType, Value> Although, as a filter, when 
it's received, or stored, it's as a String, and there are some specific rules 
to follow to get the String being transformed to RDNs. Bottom line, the DN 
preprocessing is the following :

DN String --> preProcessing -> Rdns, AVA, Tuple<AttributeType, value> [-> 
normalization -> preparation] (for each AVA)

Again, the String preparation is needed because we will store the RDN into an 
index, and that requires some comparison (note that it's not always the case, 
typically for attributeType with a DN syntax).

Comparing values
----------------

We saw that we need to normalize and prepare values before being able to 
compare them. A good question would be : do we need to prepare the String 
beforehand or when we need to compare values ? That's quite irrelevant : it's a 
choice that need to be make at some point, but it just impacts the performance 
and the storage size. We can consider that when we start comparing two values, 
they are already prepared (either because we have stored a prepared version of 
the String, or because we have just prepared teh String on the fly before 
calling the compare method).

The Client
==========

I will just talk about the Ldap API here, I'm not interested in any other 
client.

We have two flavors : schema eware and schema agnostic. We also have to 
consider two aspects : when we send data to the server, and when we process the 
result.

Schema agnostic client
----------------------

There is no so much we can do here. we have no idea about what can be the 
value's syntax, so we can't normalize the value. Bottom line, here is the basic 
processing of a value sent to the server :

- we don't touch the values. At all. We just convert them from Unicode to UTF-8
- we pre-process filters to feed the SearchRequest. values are unescaped (ie 
the escaped chars are replaced by their binary counterpart)
- we don't touch the DN

Whe values are received from the server, we need to process the data this way :

- we don't touch the values, we just convert them from UTF-8 to Unicode
- we don't touch the DN : it's already in String format, we just convert them 
from UTF-8 to Unicode

Schema aware client
-------------------

This is more complex, because now, we can process the values before sending 
them to the server. This put some load on the client side instead of pounding 
the server with incorrect data that will get rejected anyway.

- Values : we normalize them, prepare them and check their syntax. At the end, 
we convert the original value from Unicode to UTF-8. As we can see, we lose the 
normalized and prepared value.
- Filter : we unescape them, then we convert them to UTF-8
- Dn : we parse it, unescape it, normalizing each value, and at the end, if the 
DN is valid, we send the original value as is, after having converting it to 
UTF-8

As we can see, all what we do is to check the values before sending them to the 
remote server, except for the filter.

For the received values, we first convert them to Unicode and that's pretty 
much it.

Escaping
--------

DN and Filters need some pre-processing called unescaping when we have to 
transform them from a String to an internal instance. For Filter, this is 
always done on the client side, for the DN is done on the server side. The idea 
is to transform those values from a String (human
readable) form to a binary form.

What we do wrong
----------------

We will only focus on the schema aware API here. This is what we use on the 
server side anyway...

* First, we are depending on the same API on both side (client and server). 
This make things more complex, because the context is different. For instance, 
there is no need to parse the DN on the client, but we still do it.  I'm not 
sure that we could easily abvoid doing so.
To some extent, we are penalizing the client.

* The most complex situation is when we have to procesds the DN. This is always 
done in two phases :
- slice the DN into RDNs, the RDNs into AVAs containg Values
- apply the schema on each value

We coulde easily imagine doing the processing in one single pass.
Actually, this is an error not to do so : this cost time, and the classes are 
therfore not immutable.

* One specific problematic point is when we process escaped chars. For 
instance, something like : 'cn=a\ \ \ b' is just a cn with a value containing 3 
spaces. This is what should be returned to the user, and not a value with only 
one space. *But* we will be able to retrieve this value using one of those 
filters : (cn=a b) or (cn=a  b) or (cn=
a         b). Actually the number of spaces is irrelevant when comparing
the value, it's not when it comes to send back the value to the user.
Again, it has all to see with the distinction between storing values and 
comparing values.
For filters, we must unescape the String before sending it to the server. The 
server does not handle the Filter as a String.

* The PrepareString class needs to be reviewed. We don't handle spaces the way 
it's supposed to be done.

Value Class
-----------

I'm not exactly proud of it. It was a way to avoid having code like :

    if ( value instance of String )
    {
        // This is a String
    }
    else
    {
        // This is a byte[]
    }

so now, we have StringValue and BinaryValue, both of them could be used with an 
AttributeType when they are SchemaAware. In retrospect, I think the distinction 
between String and Binary values was an error. We should have a Value, holding 
both, with a flag in it. Chaning that means we review the entire code, again...

Conclusion
==========

This is not a pleasant situation. We have some cases where we don't handle 
things correctly, and this is largely due to some choices made a decade ago. 
Now, I don't think that this should be kept as is. Sometime a big refactoring 
is better than patching this and that...

Now, feel free to express yourself, I would be vert happy to have your opinion.

Many thanks !

RE: Directory Studio: Backslash in DN breaks studio

Reply via email to