Hi Shawn,
    This is our own implementation of data source (canon name
com.flipkart.w3.solr.MultiSPCMSProductsDataSource) , which pulls the data
from out downstream service and it doesn't cache data in RAM. It fetches
the data in batches of 200 and iterates over it when DIH asks for it. I
will check the possibility of leak, but unlikely.
       Can OOM issue be because during analysis, IndexWriter finds the
document to be too large to fit in 100 MB memory and can't flush to disk ?
Our analyzer chain doesn't make easy (specially with a field like) (does a
cross product of synonyms terms)

        <fieldType name="textStemmed" class="solr.TextField" indexed="true"
stored="false" multiValued="true" positionIncrementGap="100"
omitNorms="true">
            <analyzer type="index">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
                <filter class="solr.*SynonymFilterFactory*" synonyms="*
synonyms_index.txt*" ignoreCase="true" expand="*true*"/>
                 <filter class="solr.KStemFilterFactory" />
                <filter class="solr.EnglishMinimalStemFilterFactory"/>
               <filter class="solr.*SynonymFilterFactory*" synonyms="*
synonyms_index.txt*" ignoreCase="true" expand="true"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"/>
                <filter class="solr.SynonymFilterFactory"
synonyms="synonyms_index.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.KStemFilterFactory" />
                <filter
class="solr.EnglishMinimalStemFilterFactory"/>
                 <filter class="solr.SynonymFilterFactory"
synonyms="synonyms_index.txt" ignoreCase="true" expand="true"/>
                <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
         </analyzer>
        </fieldType>




On Wed, May 22, 2013 at 5:03 AM, Shawn Heisey <s...@elyograg.org> wrote:

> On 5/21/2013 5:14 PM, Umesh Prasad wrote:
>
>> We have sufficient RAM on machine ..64 GB and we have given JVM 32 GB of
>> memory. The machine runs Indexing primarily.
>>
>> The JVM doesn't run out of memory. It is the particular
>> IndexWriterSolrCore
>> which has .. May be we have specified too low a memory for IndexWriter ..
>>
>> We index mainly product data and use DIH to pull data from downstream
>> services. Autocommiit is off. The commit is infrequent  for legacy
>> reasons.. 1 commit in 2-3 hrs. It it makes a difference, then, a Core can
>> have more than10 lakh documents uncommitted at a time. IndexWriter has a
>> memory of 100 MB
>>       We ran with same config on Solr 3.5 and we never ran out of Memory.
>> But then, I hadn't tried hard commits on Solr 3.5.
>>
>
> Hard commits are the only kind of commits that Solr 3.x has.  It's soft
> commits that are new with 4.x.
>
>
>  Data-Source Entry :
>> <dataConfig>
>> <dataSource name="products" type="**MultiSPCMSProductsDataSource"
>>
>
> This appears to be using a custom data source, not one of the well-known
> types.  If it had been JDBC, I would be saying that your JDBC driver is
> trying to cache the entire result set in RAM.  With a MySQL data source, a
> batchSize of -1 fixes this problem, by internally changing the JDBC
> fetchSize to Integer.MIN_VALUE.  Other databases have different mechanisms.
>
> With this data source, I have no idea at all how to make sure that it
> doesn't cache all results in RAM.  It might be that the combination of the
> new Solr and this custom data source causes a memory leak, something that
> doesn't happen with the old Solr version.
>
> You said that the transaction log directory is empty.  That rules out one
> possibility, which would be solved by the autoCommit settings on this page:
>
> http://wiki.apache.org/solr/**SolrPerformanceProblems#Slow_**startup<http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_startup>
>
> Aside from the memory leak idea, or possibly having your entire source
> data cached in RAM, I have no idea what's happening here.
>
> Thanks,
> Shawn
>
>


-- 
---
Thanks & Regards
Umesh Prasad

Reply via email to