On Nov 14, 2015 2:21 AM, "Clay Ferguson" <[email protected]> wrote: > > In my opinion this one issue is the single most crippling achilies heel of > the entire JCR. Very likely to drive away many potential users of this API. > It's touted as an enterprise-scale API, but yet chokes on just a few tens > of thousands of nodes. This, IMO urgently needs to be addressed. I know > it's a technical limitation, and not a design decision, but to me that just > means it's an 'unsolved' problem. I'm not complaining or criticizing > developers, i'm just saying that as a community we need to solve this. I > should be able to have a 50 million nodes, and not be a problem, in an > ideal situation. RDBMS have solved these issues years ago, by a "never load > everything all at once" rule. However somehow the "It's ok to load all > children in memory" mentality caught on in the JCR and we are now stuck > with the results.
Nope that this usually applies to direct child nodes, i.e. 50k nodes with the same parent. Such a number spread throughout the repository is not an issue. Robert > > > Best regards, > Clay Ferguson > [email protected] > > > On Fri, Nov 13, 2015 at 4:47 PM, Dirk Rudolph <[email protected] > > wrote: > > > Did I understood you right, you have thousands of child nodes below the > > root node? > > > > You should avoid this because this is considered bad practice in terms of > > write performance and depending on your concurrent access this might also > > block read access. > > > > http://wiki.apache.org/jackrabbit/Performance > > > > Try to introduce a structure to your content using BTreeManger > > > > > > > > https://jackrabbit.apache.org/api/2.10/org/apache/jackrabbit/commons/flat/BTreeManager.html > > > > Cheers, D > > > > > > On Friday, 13 November 2015, David Marginian <[email protected]> wrote: > > > > > Thanks Clay. I am not trying to load that many records at once. The > > > application is crawling a directory. It places the files from that > > > directory into JackRabbit one at a time, and puts a content id onto a > > queue > > > which is picked up by consumers on different servers. Those consumers > > then > > > use the content id to retrieve the file from JackRabbit. Each piece of > > > content is saved in a node under the root node. The performance slowdown > > > is coming from calling session.getRootNode(), from what I can gather from > > > the docs I need the root node in order to add a child node. Note the > > > slowdown is pretty significant and I don't need to have close to 50k to > > > start seeing it (I start seeing it within a few minutes of running my > > > app). I don't need orderable nodes, how do I disable that? > > > > > > > > > On 11/13/2015 03:10 PM, Clay Ferguson wrote: > > > > > >> Please let us know more about your use case. Why are you even "trying" > > to > > >> load that many records all at once. Or at least scan them one by one, I > > >> mean. In most use cases you wouldn't need to do this kind of thing, > > unless > > >> it's some kind of backup or replication. I say "most" cases... I'm not > > >> saying you don't need to just asking for a bit more background. BTW: > > If > > >> you don't need 'orderable' nodes try to avoid them. That type of node > > does > > >> not work at 'scale'... and 50K is propably pushing it. > > >> > > >> Best regards, > > >> Clay Ferguson > > >> [email protected] > > >> > > >> > > >> On Fri, Nov 13, 2015 at 3:33 PM, <[email protected]> wrote: > > >> > > >> Hi, > > >>> I am new to JackRabbit and using version 2.11.2. I am using JackRabbit > > >>> to > > >>> store documents in a multi-threaded environment. I noticed that the > > time > > >>> it takes to retrieve the root node is inconsistent and slow (several > > >>> seconds +) and degrades over time (after 50K plus child nodes retrieval > > >>> is > > >>> taking ~15 seconds). > > >>> > > >>> Originally, I was using code as follows to obtain a repository: > > >>> > > >>> public Repository getRepository() throws ClassNotFoundException, > > >>> RepositoryException { > > >>> > > >>> > > >>> > > ServiceLoader.load(Class.forName("org.apache.jackrabbit.jcr2dav.Jcr2davRepositoryFactory")); > > >>> return JcrUtils.getRepository(jackabbitServerUrl); > > >>> } > > >>> > > >>> Then I came across the following thread: > > >>> > > >>> > > >>> > > http://jackrabbit.510166.n4.nabble.com/getRootNode-takes-27-seconds-td1571027.html#a1571302 > > >>> > > >>> This thread had some useful information (BatchReadConfig), but I am not > > >>> certain how to use the API to take advantage of it. I have changed my > > >>> code > > >>> to the following but it doesn't appear that node retrieval performance > > >>> has > > >>> improved, is there something I am missing/doing wrong? > > >>> > > >>> 1) Repository Factory > > >>> public Repository getRepository(@SuppressWarnings("rawtypes") Map > > >>> parameters) throws RepositoryException { > > >>> String repositoryFactoryName = parameters != null && ( > > >>> > > >>> parameters.containsKey(PARAM_REPOSITORY_SERVICE_FACTORY) || > > >>> > > parameters.containsKey(PARAM_REPOSITORY_CONFIG)) > > >>> ? > > >>> "org.apache.jackrabbit.jcr2spi.Jcr2spiRepositoryFactory" > > >>> : "org.apache.jackrabbit.core.RepositoryFactoryImpl"; > > >>> > > >>> Object repositoryFactory; > > >>> try { > > >>> Class<?> repositoryFactoryClass = > > >>> Class.forName(repositoryFactoryName, true, > > >>> Thread.currentThread().getContextClassLoader()); > > >>> > > >>> repositoryFactory = repositoryFactoryClass.newInstance(); > > >>> } > > >>> catch (Exception e) { > > >>> throw new RepositoryException(e); > > >>> } > > >>> > > >>> if (repositoryFactory instanceof RepositoryFactory) { > > >>> return ((RepositoryFactory) > > >>> repositoryFactory).getRepository(parameters); > > >>> } > > >>> else { > > >>> throw new RepositoryException(repositoryFactory + " is > > not a > > >>> RepositoryFactory"); > > >>> } > > >>> } > > >>> > > >>> 2) Use the factory to get a repo: > > >>> public Repository getRepository() throws ClassNotFoundException, > > >>> RepositoryException { > > >>> Map<String, RepositoryConfig> parameters = > > >>> Collections.singletonMap( > > >>> "org.apache.jackrabbit.jcr2spi.RepositoryConfig", > > >>> (RepositoryConfig) new > > >>> RepositoryConfigImpl(jackabbitServerUrl)); > > >>> > > >>> return getRepository(parameters); > > >>> } > > >>> > > >>> 3) Repository Config: > > >>> private static final class RepositoryConfigImpl implements > > >>> RepositoryConfig { > > >>> > > >>> private String jackabbitServerUrl; > > >>> > > >>> private RepositoryConfigImpl(String jackabbitServerUrl) { > > >>> super(); > > >>> this.jackabbitServerUrl = jackabbitServerUrl; > > >>> } > > >>> > > >>> public CacheBehaviour getCacheBehaviour() { > > >>> return CacheBehaviour.INVALIDATE; > > >>> } > > >>> > > >>> public int getItemCacheSize() { > > >>> return 100; > > >>> } > > >>> > > >>> public int getPollTimeout() { > > >>> return 5000; > > >>> } > > >>> > > >>> public RepositoryService getRepositoryService() throws > > >>> RepositoryException { > > >>> BatchReadConfig brc = new BatchReadConfig() { > > >>> public int getDepth(Path path, PathResolver resolver) > > >>> throws NamespaceException { > > >>> return 1; > > >>> } > > >>> }; > > >>> return new RepositoryServiceImpl(jackabbitServerUrl, brc); > > >>> } > > >>> > > >>> } > > >>> > > >>> Thanks for your time. > > >>> > > >>> David > > >>> > > >>> > > >>> > > >>> > > >>> > > > > > > > -- > > > > Dirk Rudolph | Senior Software Engineer > > > > Netcentric AG > > > > M: +41 79 642 37 11 > > D: +49 174 966 84 34 > > > > [email protected] | www.netcentric.biz > >
