The program and the repository run on different machines. I'm not sure who generates the exception. From the type "RepositoryException" I would guess the cause of the error is the repository. My program counts the amount of data retrieved when the error occurs - it is approx 250 MB. The total count of files until then is 12, Locally I have enough dasd-space to keep the data and at repository side I see the same. I have already run this program working with 5000 files and a total of 4GB in size. But that we're all very small files, the "garbage" could be removed fast enough. So I have left these questions to answer: - where is the source of the error my program or the repository - is it a problem of virtual storage or one of disk space - after all, how can I get rid of it.
I'm running the program with Xmx10g. Thank you, Ulrich Am 12.04.2013 um 16:24 schrieb Stefan Guggisberg <[email protected]>: > On Fri, Apr 12, 2013 at 3:29 PM, Ulrich <[email protected]> wrote: >> Retrieving data is completely sequential, no concurrent processing at all. I >> changed the code to session.logout() and session.connect() after every step, >> this didn't help. >> So the code works like this: >> while (String path : pathList) { >> Session session = ... >> Node currentNode = session.getNode(path); >> Node filenode = Node.getNode("jcr:content"); >> Property jcrdata = filenode.getProperty("jcr:data"); >> InputStream is = jcrdata.getBinary().getStream(); >> is.close(); >> session.logout(); >> } >> >> To be honest, this is not the exact code; the logic is spread over two >> classes - but it shows the effective data flow. >> Nevertheless - the problem remains. >> But when I retry the whole sequence later on, I get the same result - this >> means the buffer has been cleared in the meantime. >> >> It looks as if there is a kind of garbage collector > > yes, it's your jvm's garbage collector. > >> , running asynchronously not fast enough for avoiding the error but being >> done after a while. > > yes, that's expected behaviour. the jvm's garbage collection runs > async with a low priority (unless you're > running out of memory of course). > >> I tried to track the storage space by 'df -vk' but couldn't see a problem >> here. > > did you check inodes as well ('df -i /')? > > as i already mentioned: reading a lot of binaries will create > a lot of temp files. those temp files will eventually be deleted > once the gc determines that they're not used anymore (see [1]). > but this can take some time, due to the async nature of java's gc. > > an example: > > assume you have 500mb free disk space. > now when you're reading 1k binaries from the repository, 1mb size each, > in a loop, you're likely going to see said exception. > > and the exception's message, 'no space left on device', is pretty clear: > you're (temporarily) running out of disk space. > > did you try forcing gc cycles during your processing? > > > [1] > http://jackrabbit.apache.org/api/2.0/org/apache/jackrabbit/util/TransientFileFactory.html > > >> On Monday (I'm not in office right now) I will insert a Thread.sleep(20000) >> to the workflow above to verify my theory. >> >> Best regards, >> Ulrich >> >> >> >> >> >> Am 12.04.2013 um 10:13 schrieb Stefan Guggisberg >> <[email protected]>: >> >>> On Fri, Apr 12, 2013 at 12:21 AM, Ulrich <[email protected]> wrote: >>>> While retrieving lots of data in a loop from several nt:file nodes I >>>> always get a "no space left on device"-exception. The code is: >>>> Node filenode = Node.getNode("jcr:content"); >>>> Property jcrdata = filenode.getProperty("jcr:data"); >>>> InputStream is = jcrdata.getBinary().getStream(); >>>> It seems that the InputStream is buffered somewhere for the current >>>> session and that the total buffer size for a session is limited. Is this >>>> true and if so, how can I control this size? Or is there an opportunity to >>>> free the space? I can probably close my session and open a new one but I >>>> would need to change the logic of my program, >>>> >>>> Any hint is very welcome. >>> >>> larger binaries are buffered in temp files on read (smaller ones are >>> buffered in-mem). >>> therefore, reading a lot of binaries concurrently will result in a lot >>> of temp files. >>> those temp files will go away once they're not referenced anymore. >>> your obviously running out of disk space. >>> >>> the following should help: >>> >>> 1. make sure you close the input stream as early as possible >>> 2. if this is a specific job you're running (such as e.g. an export) you >>> could >>> try forcing gc cycles in between >>> 3. increase your disk space >>> >>> cheers >>> stefan >>> >>>> >>>> Ulrich
