On Mon, Oct 14, 2013 at 9:33 AM, Marcel Reutegger <[email protected]> wrote: > Hi, > >> So I guess there are at least three components involved: >> >> * The backend JCR repository >> * The webdav access to it via jetty >> * The client library in my program side. >> >> I was thinking that webdav would enable some kind of seekable access, >> but I think one of those components is breaking the chain. I don't >> understand how is one supposed to get seekable access when the repository >> is accessed via network. > > you are using the built-in WebDAV support of Jackrabbit. I don't know
I have also tried the RMI connection, and in this case the problem is that If I use RMI via, http://jackrabbit-server/rmi URIs, Jackrabbit copies binaries to java.io.tmpdir every time they are used in a session, and doesn't delete the copy when the session is disposed. org.apache.jackrabbit.rmi.value.SerializableBinaryValue is the place where the temporary is created, and never deleted. The snippet is: > if (n.hasProperty("jcr:data")) { > Value v = n.getProperty("jcr:data").getValue(); > if (v.getType() == PropertyType.BINARY) { > Binary b = v.getBinary(); > System.out.println("Binary class: " + b.getClass().toString()); > byte[] buff = new byte[100]; > // InputStream ios = b.getStream(); > // ios.skip(b.getSize() - buff.length); > // System.out > // .println("Stream class: " + ios.getClass().toString()); > // int len = ios.read(buff); > // ios.close(); > int len = b.read(buff, b.getSize() - buff.length); > System.out.println("Binary(" + len + "): " > + new String(buff, "UTF-8")); > b.dispose(); > } > v = null; > } The alternative getStream() implementation shown in comments leaves the same temporary files in java.io.tmpdir. This is a no-no for me, as having one non-deleted copy of a 1G file per session will fill any temporary space I could design when we use it for media, and we have no simple way to know when the file is no longer in use. the dispose implementation is supposed to delete the file, but this is not happening in my tests, not sure why. It is supposed to be called by myself, as you can see, and also automatically on finalize... I guess something related with the transient nature of the file or the stubs is avoiding the finalization to be called for RMI objects... This is probably a bug and discards RMI for us. Using a /server URL does not leave the temporary files on exit. > > > the implementation that well, but it may well be that it doesn't support > > the seekable access you need. > > It does answer to any request that includes the header "Range: bytes=0-200" with a 200 status and the whole file (700 Megs in my test). Further, nothing in what I have peeked into the implementations of jackrabbit stable, unstable or oak hints to support of byte ranges at the DAV server side. In addition, the client side of the JCR library, both RMI and DAV, implements binaries by requesting the whole repository binary resources, which means, that, even if I implement ranges support in the server, I still need to rewrite the client implementation so that it uses it and access the binary piece-wise... There are claims in the documentation that binaries are not loaded fully into memory, and I expected this to mean that they are not fully transferred on the network for client-server setups. I'm seeing that this is not the case: I think the client asks the server the whole binary. The server serves it through HTTP or RMI to the client and the client then writes it into a temporary file, which is furthermore never deleted (for RMI, davex deletes it) after the client exits. The extent of modifications that we would require from jackrabbit is big enough so that we start looking for alternative ways to get a remote media repository, such as several cloud storage apps. I am asking here hoping that I made mistakes in jackrabbit configuration and the behavior I'm looking for actually exists in jackrabbit or other open source JCR implementation. > >> I'm not sure if it is a bug or just that I'm not using the typical setup... > > > > if the WebDAV functionality provided by Jackrabbit does not fit your > > needs, you could deploy your own web application with Jackrabbit and > > add a servlet, which supports range requests. or better yet, if it is > > indeed missing functionality in Jackrabbit WebDAV, propose a patch :) > > > I would propose a patch if: * The size of the patch is reasonable (look above, as I'd have to write from scratch or modify client and server DAVex and RMI components) * It would have reasonable possibilities of getting integrated relatively fast. One of the reasons we look for JCR is standard compliance and vendor independence, and having to deliver a patched jackrabbit removes a substantial part of these advantages. We are still in the process of deciding between several alternatives for the media server API and backend/media server support we'll use for out media framework. If anyone has hints on JCR products that support access to binaries the way I'm describing, please don't hesitate to contact me on list or privately. Regards Santiago > > > regards > > marcel This is a no-no for me, as having one non-deleted copy of a 1G file per session will fill any temporary space I could design when we use it for media, and we have no simple way to know when the file is no longer in use. the dispose implementation is supposed to delete the file, but this is not happening in my tests, not sure why. It is supposed to be called by myself and also on finalize... > the implementation that well, but it may well be that it doesn't support > the seekable access you need. > It does answer to any request that includes the header "Range: bytes=0-200" with a 200 status and the whole file (700Megs in my test). Further, nothing in what I have peeked into the implementations of jackrabbit stable, unstable or oak hints to support of byteranges at the Dav server side. What is worse, the client side of the JCR library, both RMI and Dav, is requesting GET with the whole binary resources, which means, that, even if I implement ranges support in the server, I still need to rewrite the client implementation so that it uses it. There are claims in the documentation that binaries are not loaded fully into memory, and I expected this to mean that they are not fully transferred on the network for client-server setups. I'm seeing that this is not the case: I think the client asks the server the whole binary. The server serves it through HTTP or RMI to the client and the client then writes it into a temporary file, which is furthermore never deleted after the client exits. The extent of modifications that we would require from jackrabbit is big enough so that we start looking for alternative ways to get a remote media repository, such as several cloud storage apps. I am asking here hoping that I made mistakes in jackrabbit configuration and the behaviour I'm looking for actually exists in jackrabbit or other open source JCR implementation. >> I'm not sure if it is a bug or just that I'm not using the typical setup... > > if the WebDAV functionality provided by Jackrabbit does not fit your > needs, you could deploy your own web application with Jackrabbit and > add a servlet, which supports range requests. or better yet, if it is > indeed missing functionality in Jackrabbit WebDAV, propose a patch :) > I would propose a patch if: * the size of the patch is reasonable (look above, as I'd have to write from scratch/modify client and server DAVex and RMI components) * It would have reasonable possibilites of getting integrated relatively fast (one of the reasons we look for JCR is compatibility and base, and having to patch jackrabbit removes these advantages) > regards > marcel
