Hi Shawn/Mikhail Khludnev,

I was going through Jira  https://issues.apache.org/jira/browse/SOLR-4799 and 
see, I can do my intended activity by specifying zipper.

I tried doing it, however I'm getting error as below:

Caused by: org.apache.solr.handler.dataimport.DataImportHandlerException: 
java.lang.IllegalArgumentException: expect increasing foreign keys for Relation 
CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782
at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:62)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:246)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:475)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:514)
at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
... 5 more
Caused by: java.lang.IllegalArgumentException: expect increasing foreign keys 
for Relation CHILD_KEY=PARENT.PARENT_KEY got: QA-HQ008880,HQ011782
at org.apache.solr.handler.dataimport.Zipper.supplyNextChild(Zipper.java:70)
at 
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:126)
at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:74)
at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:243)


Below is my dih config:


<entity name="PARENT" pk="PQRS"
                                                query="SELECT 
PQRS,PARENT_KEY,L,M,N,O FROM DEF order by PARENT_KEY DESC"
                                                >

                                                <field name="L" column="L" />
                                                <field name="M" column="M" />
                                                <field name="N" column="N" />

                                                <entity name="childentity1" 
pk="PQRS"
                                                                query="SELECT 
A,B,C,D,E,F,CHILD_KEY,MODIFY_TS FROM ABC ORDER BY CHILD_KEY  DESC"
                                                                
processor="SqlEntityProcessor" join="zipper" where="CHILD_KEY= 
PARENT.PARENT_KEY"
                                                                >

                                                                <field name="A" 
column="A" />
                                                                <field name="B" 
column="B" />
                                                </entity>


Thanks and Regards,
Srinivas Kashyap

-----Original Message-----
From: Shawn Heisey <apa...@elyograg.org>
Sent: 09 April 2019 01:27 PM
To: solr-user@lucene.apache.org
Subject: Re: Sql entity processor sortedmapbackedcache out of memory issue

On 4/8/2019 11:47 PM, Srinivas Kashyap wrote:
> I'm using DIH to index the data and the structure of the DIH is like below 
> for solr core:
>
> <entity>
> 16 child entities
> </entity>
>
> During indexing, since the number of requests being made to database was 
> high(to process one document 17 queries) and was utilizing most of 
> connections of database thereby blocking our web application.

If you have 17 entities, then one document will indeed take 17 queries.
That's the nature of multiple DIH entities.

> To tackle it, we implemented SORTEDMAPBACKEDCACHE with cacheImpl parameter to 
> reduce the number of requests to database.

When you use SortedMapBackedCache on an entity, you are asking Solr to store 
the results of the entire query in memory, even if you don't need all of the 
results.  If the database has a lot of rows, that's going to take a lot of 
memory.

In your excerpt from the config, your inner entity doesn't have a WHERE clause. 
 Which means that it's going to retrieve all of the rows of the ABC table for 
*EVERY* single entry in the DEF table.  That's going to be exceptionally slow.  
Normally the SQL query on inner entities will have some kind of WHERE clause 
that limits the results to rows that match the entry from the outer entity.

You may need to write a custom indexing program that runs separately from Solr, 
possibly on an entirely different server.  That might be a lot more efficient 
than DIH.

Thanks,
Shawn
________________________________
DISCLAIMER:
E-mails and attachments from Bamboo Rose, LLC are confidential.
If you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way.
No representation is made that this email or any attachments are free of 
viruses. Virus scanning is recommended and is the responsibility of the 
recipient.

Reply via email to