i've benchmarked the import already with 500k records, one time without the artists subquery, and one time without the join in the main query:
Without subquery: 500k in 3 min 30 sec Without join and without subquery: 500k in 2 min 30. With subquery and with left join: 320k in 6 Min 30 so the joins / subqueries are definitely a bottleneck. How exactly did you implement the custom data import? In our case, we need to de-normalize the relations of the sql data for the index, so i fear i can't really get rid of the join / subquery. -robert On Dec 15, 2010, at 15:43 , Tim Heckman wrote: > 2010/12/15 Robert Gründler <rob...@dubture.com>: >> The data-config.xml looks like this (only 1 entity): >> >> <entity name="track" query="select t.id as id, t.title as title, >> l.title as label from track t left join label l on (l.id = t.label_id) where >> t.deleted = 0" transformer="TemplateTransformer"> >> <field column="title" name="title_t" /> >> <field column="label" name="label_t" /> >> <field column="id" name="sf_meta_id" /> >> <field column="metaclass" template="Track" name="sf_meta_class"/> >> <field column="metaid" template="${track.id}" name="sf_meta_id"/> >> <field column="uniqueid" template="Track_${track.id}" >> name="sf_unique_id"/> >> >> <entity name="artists" query="select a.name as artist from artist a >> left join track_artist ta on (ta.artist_id = a.id) where >> ta.track_id=${track.id}"> >> <field column="artist" name="artists_t" /> >> </entity> >> >> </entity> > > So there's one track entity with an artist sub-entity. My (admittedly > rather limited) experience has been that sub-entities, where you have > to run a separate query for every row in the parent entity, really > slow down data import. For my own purposes, I wrote a custom data > import using SolrJ to improve the performance (from 3 hours to 10 > minutes). > > Just as a test, how long does it take if you comment out the artists entity?