Re: Evaluating jackrabbit use for media content

Santiago Gala Tue, 15 Oct 2013 02:34:53 -0700

On Mon, Oct 14, 2013 at 9:33 AM, Marcel Reutegger <[email protected]>
wrote:
> Hi,
>
>> So I guess there are at least three components involved:
>>
>> * The backend JCR repository
>> * The webdav access to it via jetty
>> * The client library in my program side.
>>
>> I was thinking that webdav would enable some kind of seekable access,
>> but I think one of those components is breaking the chain. I don't
>> understand how is one supposed to get seekable access when the repository
>> is accessed via network.
>
> you are using the built-in WebDAV support of Jackrabbit. I don't know


I have also tried the RMI connection, and in this case the problem is that
If I use RMI via, http://jackrabbit-server/rmi URIs, Jackrabbit copies
binaries to java.io.tmpdir every time they are used in a session, and
doesn't delete the copy when the session is disposed.
org.apache.jackrabbit.rmi.value.SerializableBinaryValue is the place where
the temporary is created, and never deleted.

The snippet is:

> if (n.hasProperty("jcr:data")) {
>     Value v = n.getProperty("jcr:data").getValue();
>     if (v.getType() == PropertyType.BINARY) {
>         Binary b = v.getBinary();
>         System.out.println("Binary class: " + b.getClass().toString());
>         byte[] buff = new byte[100];
>         // InputStream ios = b.getStream();
>         // ios.skip(b.getSize() - buff.length);
>         // System.out
>         // .println("Stream class: " + ios.getClass().toString());
>         // int len = ios.read(buff);
>         // ios.close();
>         int len = b.read(buff, b.getSize() - buff.length);
>         System.out.println("Binary(" + len + "): "
>             + new String(buff, "UTF-8"));
>         b.dispose();
>         }
>     v = null;
>     }

The alternative getStream() implementation shown in comments leaves the
same temporary files in java.io.tmpdir.

This is a no-no for me, as having one non-deleted copy of a 1G file per
session will fill any temporary space I could design when we use it for
media, and we have no simple way to know when the file is no longer in use.
the dispose implementation is supposed to delete the file, but this is not
happening in my tests, not sure why. It is supposed to be called by myself,
as you can see, and also automatically on finalize... I guess something
related with the transient nature of the file or the stubs is avoiding the
finalization to be called for RMI objects... This is probably a bug and
discards RMI for us.

Using a /server URL does not leave the temporary files on exit.

>
> > the implementation that well, but it may well be that it doesn't support
> > the seekable access you need.
> >

It does answer to any request that includes the header "Range: bytes=0-200"
with a 200 status and the whole file (700 Megs in my test). Further,
nothing in what I have peeked into the implementations of jackrabbit
stable, unstable or oak hints to support of byte ranges at the DAV server
side.

In addition, the client side of the JCR library, both RMI and DAV,
implements binaries by requesting the whole repository binary resources,
which means, that, even if I implement ranges support in the server, I
still need to rewrite the client implementation so that it uses it and
access the binary piece-wise...

There are claims in the documentation that binaries are not loaded fully
into memory, and I expected this to mean that they are not fully
transferred on the network for client-server setups. I'm seeing that this
is not the case: I think the client asks the server the whole binary. The
server serves it through HTTP or RMI to the client and the client then
writes it into a temporary file, which is furthermore never deleted (for
RMI, davex deletes it) after the client exits.

The extent of modifications that we would require from jackrabbit is big
enough so that we start looking for alternative ways to get a remote media
repository, such as several cloud storage apps. I am asking here hoping
that I made mistakes in jackrabbit configuration and the behavior I'm
looking for actually exists in jackrabbit or other open source JCR
implementation.

> >> I'm not sure if it is a bug or just that I'm not using the typical
setup...
> >
> > if the WebDAV functionality provided by Jackrabbit does not fit your
> > needs, you could deploy your own web application with Jackrabbit and
> > add a servlet, which supports range requests. or better yet, if it is
> > indeed missing functionality in Jackrabbit WebDAV, propose a patch :)
> >
>

I would propose a patch if:
* The size of the patch is reasonable (look above, as I'd have to write
from scratch or modify client and server DAVex and RMI components)
* It would have reasonable possibilities of getting integrated relatively
fast. One of the reasons we look for JCR is standard compliance and vendor
independence, and having to deliver a patched jackrabbit removes a
substantial part of these advantages.

We are still in the process of deciding between several alternatives for
the media server API and backend/media server support we'll use for out
media framework. If anyone has hints on JCR products that support access to
binaries the way I'm describing, please don't hesitate to contact me on
list or privately.

Regards
Santiago

>
> > regards
> >  marcel


This is a no-no for me, as having one non-deleted copy of a 1G file per
session will fill any temporary space I could design when we use it for
media, and we have no simple way to know when the file is no longer in use.
the dispose implementation is supposed to delete the file, but this is not
happening in my tests, not sure why. It is supposed to be called by myself
and also on finalize...

> the implementation that well, but it may well be that it doesn't support
> the seekable access you need.
>

It does answer to any request that includes the header "Range: bytes=0-200"
with a 200 status and the whole file (700Megs in my test). Further, nothing
in what I have peeked into the implementations of jackrabbit stable,
unstable or oak hints to support of byteranges at the Dav server side.

What is worse, the client side of the JCR library, both RMI and Dav, is
requesting GET with the whole binary resources, which means, that, even if
I implement ranges support in the server, I still need to rewrite the
client implementation so that it uses it.

There are claims in the documentation that binaries are not loaded fully
into memory, and I expected this to mean that they are not fully
transferred on the network for client-server setups. I'm seeing that this
is not the case: I think the client asks the server the whole binary. The
server serves it through HTTP or RMI to the client and the client then
writes it into a temporary file, which is furthermore never deleted after
the client exits.

The extent of modifications that we would require from jackrabbit is big
enough so that we start looking for alternative ways to get a remote media
repository, such as several cloud storage apps. I am asking here hoping
that I made mistakes in jackrabbit configuration and the behaviour I'm
looking for actually exists in jackrabbit or other open source JCR
implementation.

>> I'm not sure if it is a bug or just that I'm not using the typical
setup...
>
> if the WebDAV functionality provided by Jackrabbit does not fit your
> needs, you could deploy your own web application with Jackrabbit and
> add a servlet, which supports range requests. or better yet, if it is
> indeed missing functionality in Jackrabbit WebDAV, propose a patch :)
>

I would propose a patch if:
* the size of the patch is reasonable (look above, as I'd have to write
from scratch/modify client and server DAVex and RMI components)
* It would have reasonable possibilites of getting integrated relatively
fast (one of the reasons we look for JCR is compatibility and base, and
having to patch jackrabbit removes these advantages)


> regards
>  marcel

Re: Evaluating jackrabbit use for media content

Reply via email to