Hello O,
It seems to me (but it's better to look at the heap histogram) that
buffering sub-entities in SortedMapBackedCache blows heap off.
I'm aware about two directions:
- use file based cache instead. I don't know exactly how it works, you can
start from https://issues.apache.org/jira/browse/SOLR-2382 and check how to
enable berkleyDB cache;
- personally, I'm promoting merging resultsets ordered by RDBMS
https://issues.apache.org/jira/browse/SOLR-4799




On Fri, May 9, 2014 at 7:16 PM, O. Olson <olson_...@yahoo.it> wrote:

> I have a Data Schema which is Hierarchical i.e. I have an Entity and a
> number
> of attributes. For a small subset of the Data - about 300 MB, I can do the
> import with 3 GB memory. Now with the entire 4 GB Dataset, I find I cannot
> do the import with 9 GB of memory.
> I am using the SqlEntityProcessor as below:
>
> <dataConfig>
>     <dataSource driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>
> url="jdbc:sqlserver://localhost\MSSQLSERVER;databaseName=SolrDB;user=solrusr;password=solrusr;"/>
>     <document>
>                         <entity name="Entity" query="SELECT EntID, Image
> FROM ENTITY_TABLE">
>                                 <field column="EntID" name="EntID" />
>                                 <field column="Image" name="Image" />
>
>                                 <entity name="EntityAttribute1"
>                     query="SELECT AttributeValue, EntID FROM ATTR_TABLE
> WHERE AttributeID=1"
>                                         cacheKey="EntID"
> cacheLookup="Entity.EntID"
> processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache">
>                                         <field column="AttributeValue"
> name="EntityAttribute1" />
>                                 </entity>
>                                 <entity name="EntityAttribute2"
>                     query="SELECT AttributeValue, EntID FROM ATTR_TABLE
> WHERE AttributeID=2"
>                                         cacheKey="EntID"
> cacheLookup="Entity.EntID"
> processor="SqlEntityProcessor" cacheImpl="SortedMapBackedCache">
>                                         <field column="AttributeValue"
> name="EntityAttribute2" />
>                                 </entity>
>
>
>
>                 </entity>
>     </document>
> </dataConfig>
>
>
>
> What is the best way to import this data? Doing it without a cache, results
> in many SQL queries. With the cache, I run out of memory.
>
> I’m curious why 4GB of data cannot entirely fit in memory. One thing I need
> to mention is that I have about 400 to 500 attributes.
>
> Thanks in advance for any helpful advice.
> O. O.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/DataImport-using-SqlEntityProcessor-running-Out-of-Memory-tp4135080.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to