Hi All, 
I have been using Solr for some time now but mostly in standalone mode. Now
my current project is using Solr 6.5.1 hosted on hadoop. My solrconfig.xml
has the following configuration. In the prod environment the performance on
querying seems to really slow. Can anyone help me with few pointers on
howimprove on the same. 

<directoryFactory name="DirectoryFactory" class="solr.HdfsDirectoryFactory">
        <str name="solr.hdfs.home">${solr.hdfs.home:}</str>
        <bool
name="solr.hdfs.blockcache.enabled">${solr.hdfs.blockcache.enabled:true}</bool>
        <int
name="solr.hdfs.blockcache.slab.count">${solr.hdfs.blockcache.slab.count:1}</int>
        <bool
name="solr.hdfs.blockcache.direct.memory.allocation">${solr.hdfs.blockcache.direct.memory.allocation:false}</bool>
        <int
name="solr.hdfs.blockcache.blocksperbank">${solr.hdfs.blockcache.blocksperbank:16384}</int>
        <bool
name="solr.hdfs.blockcache.read.enabled">${solr.hdfs.blockcache.read.enabled:true}</bool>
        <bool
name="solr.hdfs.blockcache.write.enabled">${solr.hdfs.blockcache.write.enabled:false}</bool>
        <bool
name="solr.hdfs.nrtcachingdirectory.enable">${solr.hdfs.nrtcachingdirectory.enable:true}</bool>
        <int
name="solr.hdfs.nrtcachingdirectory.maxmergesizemb">${solr.hdfs.nrtcachingdirectory.maxmergesizemb:16}</int>
        <int
name="solr.hdfs.nrtcachingdirectory.maxcachedmb">${solr.hdfs.nrtcachingdirectory.maxcachedmb:192}</int>
</directoryFactory>
    <lockType>hdfs</lockType>
It has 6 collections of following size 
Collection 1 -->6.41 MB
Collection 2 -->634.51 KB 
Collection 3 -->4.59 MB 
Collection 4 -->1,020.56 MB 
Collection 5 --> 607.26 MB
Collection 6 -->102.4 kb
Each Collection has 5 shards each. Allocated heap size for young generation
is about 8 gb and old generation is about 24 gb. And gc analysis showed peak
size 
utlisation is really low compared to these values. 
But querying to Collection 4 and collection 5 is giving really slow response
even thoughwe are not using any complex queries.Output of debug quries run
with debug=timing
are given below for reference. Can anyone help suggest a way improve the
performance.

Response to query
<response>
<lst name="responseHeader">
<bool name="zkConnected">true</bool>
<int name="status">0</int>
<int name="QTime">3962</int>
<lst name="params">
<str name="q">
("hybrid electric powerplant" "hybrid electric powerplants" "Electric"
"Electrical" "Electricity" "Engine" "fuel economy" "fuel efficiency" "Hybrid
Electric Propulsion" "Power Systems" "Powerplant" "Propulsion" "hybrid"
"hybrid electric" "electric powerplant")
</str>
<str name="defType">edismax</str>
<str name="debug">true</str>
<str name="indent">on</str>
<arr name="qf">
<str>host</str>
<str>title</str>
<str>url</str>
<str>customContent</str>
<str>contentSpecificSearch</str>
</arr>
<arr name="fl">
<str>id</str>
<str>contentTagsCount</str>
</arr>
<str name="start">0</str>
<str name="bq.op">OR</str>
<str name="q.op">OR</str>
<str name="correlationID">3985d7e2-3e54-48d8-8336-229e85f5d9de</str>
<str name="rows">600</str>
<str name="bq">
("hybrid electric powerplant"^100.0 "hybrid electric powerplants"^100.0
"Electric"^50.0 "Electrical"^50.0 "Electricity"^50.0 "Engine"^50.0 "fuel
economy"^50.0 "fuel efficiency"^50.0 "Hybrid Electric Propulsion"^50.0
"Power Systems"^50.0 "Powerplant"^50.0 "Propulsion"^50.0 "hybrid"^15.0
"hybrid electric"^15.0 "electric powerplant"^15.0)
</str>
</lst>
</lst>
<result name="response" numFound="205458" start="0" maxScore="1836.806">
<lst name="timing">
<double name="time">15374.0</double>
<lst name="prepare">
<double name="time">2.0</double>
<lst name="query">
<double name="time">2.0</double>
</lst>
<lst name="facet">
<double name="time">0.0</double>
</lst>
<lst name="facet_module">
<double name="time">0.0</double>
</lst>
<lst name="mlt">
<double name="time">0.0</double>
</lst>
<lst name="highlight">
<double name="time">0.0</double>
</lst>
<lst name="stats">
<double name="time">0.0</double>
</lst>
<lst name="expand">
<double name="time">0.0</double>
</lst>
<lst name="terms">
<double name="time">0.0</double>
</lst>
<lst name="debug">
<double name="time">0.0</double>
</lst>
</lst>
<lst name="process">
<double name="time">15363.0</double>
<lst name="query">
<double name="time">1313.0</double>
</lst>
<lst name="facet">
<double name="time">0.0</double>
</lst>
<lst name="facet_module">
<double name="time">0.0</double>
</lst>
<lst name="mlt">
<double name="time">0.0</double>
</lst>
<lst name="highlight">
<double name="time">0.0</double>
</lst>
<lst name="stats">
<double name="time">0.0</double>
</lst>
<lst name="expand">
<double name="time">0.0</double>
</lst>
<lst name="terms">
<double name="time">0.0</double>
</lst>
<lst name="debug">
<double name="time">14048.0</double>
</lst>
</lst>
</lst>


Thanks,
Arun



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to