Sateesh,
just my two cents here.
I think that Jackrabbit is not really meant to be compared in this fashion.
What you are benchmarking is a sequential scan on a table, and that is
really not what Jackrabbit is meant to be doing. In this task, I wonder if a
CSV file scan would outperforms MySQL, and honestly I believe it would, but
it would not be a reason to say that CSV is "faster" than an RDBMs.
An RDMBS will ouperform any homegrown CSV based data structure when it comes
to performing complex relational queries, and will make your life as
programmer simpler. In a similar way, Jackrabbit could (or...eventually
could) outperform an RDBMS when you are using it for complex content
searches on unstructured, hierarchical repositories.And certainly, it will
make your life simpler as a programmer.
It's a tradeoff between the overhead that is introduced and the flexibility
that you gain. I am not really surprised that creating all the transient
storage, nodes structures, persistence manager abstraction would affect the
performance of a sequential access.
With all that said, your XPATH query does not match the structure that you
described: //data/componentIds/*/*" vs /content/data/Ids/type1/*. How is it
really? And, are you sure that in this case you need to a query? You can
navigate the nodes directly with node.getNodes() and avoid at least one
level of overhead.
Alessandro
On Jan 31, 2008 7:13 PM, zevon <[EMAIL PROTECTED]> wrote:
>
> The project I am looking into needs to store friendly ids to a file, and
> there could be multiple types of file. I want to show the result which
> lists
> all the id's. Below is the JR structure and SQL table and their
> performance.
> As per configuration, I am using MSSqlPersistenceManager and using
> DataStore
> for files. Otherwise rest of the config is defaults.
>
> JR structure:
>
> /content/data/Ids/type1/*
> /content/data/Ids/type2/*
> /content/data/Ids/type3/*
> /content/data/Ids/type4/*
> /content/data/Ids/type5/*
>
> All the ids are equally distributed under the specific type.
>
> SQL structure:
>
> A table with columns: Name, Type
>
> Performance numbers in ms (items are spread equally among types), when
> using
> the below query. As the numbers show, it's pretty bad. Is this expected?
> any
> way to better this?
>
> Items: JR SQL
> 150 551 15
> 1000 2969 78
> 2000 6470 94
> 4000 16816 94
> 8000 58966 125
>
> Workspace workSpace = session.getWorkspace();
> QueryManager queryManager =
> workSpace.getQueryManager();
>
> StringBuffer queryStr = new
> StringBuffer("//data/componentIds/*/*");
> Query query = queryManager.createQuery(
> queryStr.toString(),
> Query.XPATH);
>
> Query query = queryManager.createQuery(
> queryStr.toString(),
> Query.XPATH);
>
> long begin = System.currentTimeMillis();
> QueryResult queryResult = query.execute();
> int iSize = 0;
> NodeIterator queryResultNodeIterator =
> queryResult.getNodes();
> while (queryResultNodeIterator.hasNext()) {
>
> Node componentIdNode =
> queryResultNodeIterator.nextNode();
> iSize++;
> // System.out.println(
> componentIdNode.getName());
> }
> long end = System.currentTimeMillis();
> System.out.println("**** time for: " + iSize + " :
> "
> + (end - begin));
>
> For SQL, it's a simple JDBC call:
> long begin = System.currentTimeMillis();
>
> Statement stmt = con.createStatement();
> ResultSet rs = stmt.executeQuery("SELECT * FROM
> ArtifactFriendlyName");
> int iSize = 0;
> while (rs.next()) {
> iSize++;
> String s = rs.getString("FriendlyName");
> }
>
> long end = System.currentTimeMillis();
>
> System.out.println("Time taken for: " + iSize + " : " +
> (end - begin));
>
>
> Thanks,
> Sateesh.
>
> --
> View this message in context:
> http://www.nabble.com/Performance-as-compared-to-simple-sql-db-query-is-quite-bad-tp15218031p15218031.html
> Sent from the Jackrabbit - Users mailing list archive at
> Nabble.com<http://nabble.com/>
> .
>
>