Hi,

No not completely. Because of the problems I was having with paging in combination with SQLTemplate and prefetching I was doing a simple test to see how much memory would actually be required to run a big query. This in an attempt to calculate the total memory I would need for my 2.5 mil query.

This test does not use paging (because of the earlier problems) but did use prefetching (because I don't
want to run 2.5 million queries for the detail table).

The problem I saw is the following. The initial query which accesses the main table and does a left join to the detail table took about 1 minute. Then it took cayenne 2 minutes to construct the objects and related objects from the datarows produced from this query. This construction does the job correctly but in my opinion it is taking too long. If the main query can run in 1 minute and get all the data from the database (which is IO and normally would be seen as the bottleneck) why
does it take 2 minutes to convert this into objects and relations.

The conversion into memory goes fairly quickly. After that I only see 100% cpu and no changes in memory occupation (used profiler). From stack traces I can see that all the time spent is in

   org.apache.cayenne.access.DataDomainQueryAction.runQueryInTransaction()

and

org.apache.cayenne.access.DataDomainQueryAction.interceptObjectConversion()

which finally calls

   org.apache.cayenne.query.PrefetchTreeNode.traverse(PrefetchProcessor)

which does all the time consuming work.

I think there is a piece of code somewhere which traverses a list of some kind
which is inefficient. Maybe a HashMap is used with a key object without
hashCode/equals methods?

Answering your points:

1) As I'm not using paging here prefetching does help because I don't have to query the detail table separately. 2) Yes I do need these 2.5 million records because after some investigation I will forward records I need. No its clearly not a user interface. I totally agree that Cayenne at least is not the best way to use.

What I initially started to use is the iterated query such I could iterate over the data rows and construct the
objects on the fly which would then be forwarded.

The problem here is that I cannot use prefetching nor can I manually construct relationships. The code is probably there (prefetching uses it) but the api does not give me a (n easy way to) handle to use it. This effectively leaves me running separate queries for each main record what is not performing.

Anyway, my conclusion is indeed: don't use cayenne for large query processing.

tx

Hans





Aristedes Maniatis wrote:
On 13/11/09 10:04 PM, Hans Pikkemaat wrote:
I ran some tests using 3.0b with SQLTemplate in combination with
prefetching and found
a possible new problem.

It seems that when running the query in eg 1 minute, it takes about 2
minutes before cayenne
has constructed the prefetched objects.

My query produces 2.5 million records. The query will take about 30
minutes. Construction
of the objects will then take an extra hour.


Just to be clear about what you are doing:

* Cayenne 3.0 beta 1
* SQL template query
* prefeching across to-many join
* paging on

You expect the first query (which gets hollow objects) to NOT include the 
prefetch JOIN, but when fetching a page of results, it should use the prefetch. 
Cayenne is constructing the first query which includes the JOIN and that makes 
it take 30 minutes in your database to return 2.5 million records.

Is that correct?


My opinions:

1. Does prefetch really help here anyway? You are only getting (say) 100 
records at a time, so the extra queries to follow the relations may not be that 
significant.

2. Do you really want to fetch 2.5 million rows? If so, I assume this is not a 
user interface :-)  Perhaps Cayenne (or any ORM for that matter) is not the 
best way to batch process that many rows.



Ari


Reply via email to