Thanks Clay. I am not trying to load that many records at once. The
application is crawling a directory. It places the files from that
directory into JackRabbit one at a time, and puts a content id onto a
queue which is picked up by consumers on different servers. Those
consumers then use the content id to retrieve the file from JackRabbit.
Each piece of content is saved in a node under the root node. The
performance slowdown is coming from calling session.getRootNode(), from
what I can gather from the docs I need the root node in order to add a
child node. Note the slowdown is pretty significant and I don't need to
have close to 50k to start seeing it (I start seeing it within a few
minutes of running my app). I don't need orderable nodes, how do I
disable that?
On 11/13/2015 03:10 PM, Clay Ferguson wrote:
Please let us know more about your use case. Why are you even "trying" to
load that many records all at once. Or at least scan them one by one, I
mean. In most use cases you wouldn't need to do this kind of thing, unless
it's some kind of backup or replication. I say "most" cases... I'm not
saying you don't need to just asking for a bit more background. BTW: If
you don't need 'orderable' nodes try to avoid them. That type of node does
not work at 'scale'... and 50K is propably pushing it.
Best regards,
Clay Ferguson
[email protected]
On Fri, Nov 13, 2015 at 3:33 PM, <[email protected]> wrote:
Hi,
I am new to JackRabbit and using version 2.11.2. I am using JackRabbit to
store documents in a multi-threaded environment. I noticed that the time
it takes to retrieve the root node is inconsistent and slow (several
seconds +) and degrades over time (after 50K plus child nodes retrieval is
taking ~15 seconds).
Originally, I was using code as follows to obtain a repository:
public Repository getRepository() throws ClassNotFoundException,
RepositoryException {
ServiceLoader.load(Class.forName("org.apache.jackrabbit.jcr2dav.Jcr2davRepositoryFactory"));
return JcrUtils.getRepository(jackabbitServerUrl);
}
Then I came across the following thread:
http://jackrabbit.510166.n4.nabble.com/getRootNode-takes-27-seconds-td1571027.html#a1571302
This thread had some useful information (BatchReadConfig), but I am not
certain how to use the API to take advantage of it. I have changed my code
to the following but it doesn't appear that node retrieval performance has
improved, is there something I am missing/doing wrong?
1) Repository Factory
public Repository getRepository(@SuppressWarnings("rawtypes") Map
parameters) throws RepositoryException {
String repositoryFactoryName = parameters != null && (
parameters.containsKey(PARAM_REPOSITORY_SERVICE_FACTORY) ||
parameters.containsKey(PARAM_REPOSITORY_CONFIG))
? "org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory"
: "org.apache.jackrabbit.core.RepositoryFactoryImpl";
Object repositoryFactory;
try {
Class<?> repositoryFactoryClass =
Class.forName(repositoryFactoryName, true,
Thread.currentThread().getContextClassLoader());
repositoryFactory = repositoryFactoryClass.newInstance();
}
catch (Exception e) {
throw new RepositoryException(e);
}
if (repositoryFactory instanceof RepositoryFactory) {
return ((RepositoryFactory)
repositoryFactory).getRepository(parameters);
}
else {
throw new RepositoryException(repositoryFactory + " is not a
RepositoryFactory");
}
}
2) Use the factory to get a repo:
public Repository getRepository() throws ClassNotFoundException,
RepositoryException {
Map<String, RepositoryConfig> parameters =
Collections.singletonMap(
"org.apache.jackrabbit.jcr2spi.RepositoryConfig",
(RepositoryConfig) new
RepositoryConfigImpl(jackabbitServerUrl));
return getRepository(parameters);
}
3) Repository Config:
private static final class RepositoryConfigImpl implements
RepositoryConfig {
private String jackabbitServerUrl;
private RepositoryConfigImpl(String jackabbitServerUrl) {
super();
this.jackabbitServerUrl = jackabbitServerUrl;
}
public CacheBehaviour getCacheBehaviour() {
return CacheBehaviour.INVALIDATE;
}
public int getItemCacheSize() {
return 100;
}
public int getPollTimeout() {
return 5000;
}
public RepositoryService getRepositoryService() throws
RepositoryException {
BatchReadConfig brc = new BatchReadConfig() {
public int getDepth(Path path, PathResolver resolver)
throws NamespaceException {
return 1;
}
};
return new RepositoryServiceImpl(jackabbitServerUrl, brc);
}
}
Thanks for your time.
David