Hi Shawn and Shreejay, thanks for the response.
Here is some more information:
1) The machine is a virtual machine on ESX server. It has 4 CPUs and 8GB of 
RAM. I don't remember what CPU but something modern enough. It is running Java 
7 without any special parameters, and 4GB allocated for Java (-Xmx)
2) After successful indexing, I have 2.5 Million documents, 117GB index size. 
This is the size after it was optimized.
3) I plan to upgrade to 4.3 just didn't have time. 4.0 beta is what was 
available at the time that we had a release deadline.
4) The setup with master-slave replication, not Solr Cloud. The server that I 
am discussing is the indexing server, and in these tests there were actually no 
slaves involved, and virtually zero searches performed.
5) Attached is my configuration. I tried to disable the warm-up and opening of 
searchers, it didn't change anything. The commits are done by Solr, using 
autocommit. The client sends the updates without a commit command.
6) I want to disable optimization, but when I disabled it, the OOME occurred 
even faster. The number of segments reached around a thousand within an hour or 
so. I don't know if it's normal or not, but at that point if I restarted Solr 
it immediately took about 1GB of heap space just on start-up, instead of the 
usual 50MB or so.

If I commit less frequently, don't I increase the risk of losing data, e.g., if 
the power goes down, etc.?
If I disable optimization, is it necessary to avoid such a large number of 
segments? Is it possible?

Thanks again,
Yoni



-----Original Message-----
From: Shawn Heisey [mailto:s...@elyograg.org] 
Sent: Sunday, June 02, 2013 18:05
To: solr-user@lucene.apache.org
Subject: Re: out of memory during indexing do to large incoming queue

On 6/2/2013 8:16 AM, Yoni Amir wrote:
> Hello,
> I am receiving OutOfMemoryError during indexing, and after investigating the 
> heap dump, I am still missing some information, and I thought this might be a 
> good place for help.
> 
> I am using Solr 4.0 beta, and I have 5 threads that send update requests to 
> Solr. Each request is a bulk of 100 SolrInputDocuments (using solrj), and my 
> goal is to index around 2.5 million documents.
> Solr is configured to do a hard-commit every 10 seconds, so initially I 
> thought that it can only accumulate in memory 10 seconds worth of updates, 
> but that's not the case. I can see in a profiler how it accumulates memory 
> over time, even with 4 to 6 GB of memory. It is also configured to optimize 
> with mergeFactor=10.

4.0-BETA came out several months ago.  Even at the time, support for the alpha 
and beta releases was limited.  Now it has been superseded by 4.0.0, 4.1.0, 
4.2.0, 4.2.1, and 4.3.0, all of which are full releases.
There is a 4.3.1 release currently in the works.  Please upgrade.

Ten seconds is a very short interval for hard commits, even if you have 
openSearcher=false.  Frequent hard commits can cause a whole host of problems.  
It's better to have an interval of several minutes, and I wouldn't go less than 
a minute.  Soft commits can be much more frequent, but if you are frequently 
opening new searchers, you'll probably want to disable cache warming.

On optimization: don't do it unless you absolutely must.  Most of the time, 
optimization is only needed if you delete a lot of documents and you need to 
get them removed from your index.  If you must optimize to get rid of deleted 
documents, do it on a very long interval (once a day, once a week) and pause 
indexing during optimization.

You haven't said anything about your index size, java heap size, total RAM, 
etc.  With those numbers I could offer some guesses about what you need, but 
I'll warn you that they would only be guesses - watching a system with real 
data under load is the only way to get concrete information.  Here are some 
basic guidelines on performance problems and RAM information:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn


Confidentiality: This communication and any attachments are intended for the 
above-named persons only and may be confidential and/or legally privileged. Any 
opinions expressed in this communication are not necessarily those of NICE 
Actimize. If this communication has come to you in error you must take no 
action based on it, nor must you copy or show it to anyone; please 
delete/destroy and inform the sender by e-mail immediately.  
Monitoring: NICE Actimize may monitor incoming and outgoing e-mails.
Viruses: Although we have taken steps toward ensuring that this e-mail and 
attachments are free from any virus, we advise that in keeping with good 
computing practice the recipient should ensure they are actually virus free.
<?xml version="1.0" encoding="UTF-8" ?>
<config>
	<luceneMatchVersion>LUCENE_40</luceneMatchVersion>

	<dataDir>${solr.data.dir:}</dataDir>

	<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.NRTCachingDirectoryFactory}" />

	<indexConfig>

		<!-- ramBufferSizeMB sets the amount of RAM that may be used by Lucene indexing for buffering added documents and deletions before they are 
			flushed to the Directory. maxBufferedDocs sets a limit on the number of documents buffered before flushing. If both ramBufferSizeMB and maxBufferedDocs 
			is set, then Lucene will flush based on whichever limit is hit first. -->
		<ramBufferSizeMB>128</ramBufferSizeMB>
		<!-- <maxBufferedDocs>1000</maxBufferedDocs> -->

		<!-- Merge Factor The merge factor controls how many segments will get merged at a time. For TieredMergePolicy, mergeFactor is a convenience 
			parameter which will set both MaxMergeAtOnce and SegmentsPerTier at once. For LogByteSizeMergePolicy, mergeFactor decides how many new segments 
			will be allowed before they are merged into one. Default is 10 for both merge policies. -->
		<mergeFactor>10</mergeFactor>

	</indexConfig>

	<jmx />

	<!-- The default high-performance update handler -->
	<updateHandler class="solr.DirectUpdateHandler2">
		<autoCommit> 
			<maxTime>10000</maxTime> 
			<openSearcher>true</openSearcher> 
		</autoCommit>
		<updateLog>
			<str name="dir">${solr.data.dir:}</str>
		</updateLog>
	</updateHandler>

	<query>

		<maxBooleanClauses>3072</maxBooleanClauses>

		<filterCache class="solr.FastLRUCache" size="512" initialSize="512" autowarmCount="0" />

		<queryResultCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0" />

		<documentCache class="solr.LRUCache" size="512" initialSize="512" autowarmCount="0" />

		<enableLazyFieldLoading>true</enableLazyFieldLoading>

		<queryResultWindowSize>20</queryResultWindowSize>

		<queryResultMaxDocsCached>200</queryResultMaxDocsCached>

		<listener event="newSearcher" class="solr.QuerySenderListener">
			<arr name="queries">
				<!-- <lst><str name="q">solr</str><str name="sort">price asc</str></lst> <lst><str name="q">rocks</str><str name="sort">weight asc</str></lst> -->
			</arr>
		</listener>
		<listener event="firstSearcher" class="solr.QuerySenderListener">
			<arr name="queries">
				<lst>
					<str name="q">static firstSearcher warming in solrconfig.xml</str>
				</lst>
			</arr>
		</listener>

		<useColdSearcher>false</useColdSearcher>

		<maxWarmingSearchers>2</maxWarmingSearchers>

	</query>

	<requestDispatcher handleSelect="false">

		<requestParsers enableRemoteStreaming="true" multipartUploadLimitInKB="2048000" />

		<httpCaching never304="true" />

	</requestDispatcher>

	<requestHandler name="/select" class="solr.SearchHandler" default="true">
		<lst name="defaults">
			<str name="defType">edismax</str>
			<str name="qf">all_text</str>
			<str name="mm">0%</str>
			<str name="q.alt">*:*</str>
			<str name="echoParams">explicit</str>
			<int name="timeAllowed">10000</int>

			<str name="wt">xml</str>
			<int name="rows">10</int>
			<str name="fl">id, module, identifier, type, category, business_date, description, display_name, score</str>

			<str name="facet">on</str>
			<str name="facet.field">category</str>
			<str name="facet.field">type_facet</str>
			<str name="facet.field">state</str>
			<str name="facet.mincount">1</str>

			<str name="facet.query">business_date:[NOW/DAY-7DAY TO *]</str>
			<str name="facet.query">business_date:[NOW/DAY-1MONTH TO *]</str>
			<str name="facet.query">business_date:[NOW/DAY-1YEAR TO *]</str>

			<str name="hl">on</str>
			<!-- identifier type owner owner_name display_name description step state score_field2 note has is audit attachment rfi *_custom_txt *_custom_ti2 *_custom_tl2 *_custom_td2  -->
			<str name="hl.fl">identifier|type|type_name|type_abbr_name|owner_name|display_name|description|step|state|score_field2|note|audit|attachment|rfi|*_custom_txt|*_custom_ti2|*_custom_tl2|*_custom_td2</str>
			<bool name="hl.requireFieldMatch">false</bool>
			<bool name="hl.usePhraseHighlighter">true</bool>
			<bool name="hl.highlightMultiTerm">true</bool>
			<int name="hl.snippets">10</int>
			<bool name="hl.mergeContiguous">false</bool>
			<int name="hl.maxAnalyzedChars">-1</int>
			<str name="hl.encoder">html</str>
		</lst>

	</requestHandler>

	<requestHandler name="/browse" class="solr.SearchHandler">
		<lst name="defaults">
			<str name="defType">edismax</str>
			<str name="qf">all_text</str>
			<str name="mm">0%</str>
			<str name="q.alt">*:*</str>
			<str name="echoParams">explicit</str>
			<int name="timeAllowd">100000</int>

			<!-- VelocityResponseWriter settings -->
			<str name="wt">velocity</str>
			<str name="v.template">browse</str>
			<str name="v.layout">layout</str>
			<str name="title">Solritas</str>

			<str name="rows">10</str>
			<str name="fl">*,score</str>

			<str name="facet">on</str>
			<str name="facet.field">category</str>
			<str name="facet.field">type_facet</str>
			<str name="facet.field">state</str>
			<str name="facet.mincount">1</str>

			<str name="facet.query">business_date:[NOW/DAY-7DAY TO *]</str>
			<str name="facet.query">business_date:[NOW/DAY-1MONTH TO *]</str>
			<str name="facet.query">business_date:[NOW/DAY-1YEAR TO *]</str>

			<!-- some facet examples -->
			<!-- <str name="facet.range.other">after</str> <str name="facet.range">score</str> <int name="f.score.facet.range.start">0</int> <int name="f.score.facet.range.end">100</int> 
				<int name="f.score.facet.range.gap">25</int> <str name="facet.range">business_date</str> <str name="f.business_date.facet.range.start">NOW/YEAR-5YEAR</str> 
				<str name="f.business_date.facet.range.end">NOW/DAY</str> <str name="f.business_date.facet.range.gap">+1YEAR</str> -->

			<str name="hl">on</str>
			<str name="hl.fl">*</str>
			<bool name="hl.requireFieldMatch">false</bool>
		</lst>
	</requestHandler>

	<requestHandler name="/browse2" class="solr.SearchHandler">
		<lst name="defaults">
			<str name="echoParams">explicit</str>

			<!-- VelocityResponseWriter settings -->
			<str name="wt">velocity</str>
			<str name="v.template">browse</str>
			<str name="v.layout">layout</str>
			<str name="title">Solritas</str>

			<!-- Query settings -->
			<str name="defType">edismax</str>
			<str name="qf">
				text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
				title^10.0 description^5.0 keywords^5.0 author^2.0
				resourcename^1.0
			</str>
			<str name="df">text</str>
			<str name="mm">100%</str>
			<str name="q.alt">*:*</str>
			<str name="rows">10</str>
			<str name="fl">*,score</str>

			<str name="mlt.qf">
				text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
				title^10.0 description^5.0 keywords^5.0 author^2.0
				resourcename^1.0
			</str>
			<str name="mlt.fl">text,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename</str>
			<int name="mlt.count">3</int>

			<!-- Faceting defaults -->
			<str name="facet">on</str>
			<str name="facet.field">cat</str>
			<str name="facet.field">manu_exact</str>
			<str name="facet.field">content_type</str>
			<str name="facet.field">author_s</str>
			<str name="facet.query">ipod</str>
			<str name="facet.query">GB</str>
			<str name="facet.mincount">1</str>
			<str name="facet.pivot">cat,inStock</str>
			<str name="facet.range.other">after</str>
			<str name="facet.range">price</str>
			<int name="f.price.facet.range.start">0</int>
			<int name="f.price.facet.range.end">600</int>
			<int name="f.price.facet.range.gap">50</int>
			<str name="facet.range">popularity</str>
			<int name="f.popularity.facet.range.start">0</int>
			<int name="f.popularity.facet.range.end">10</int>
			<int name="f.popularity.facet.range.gap">3</int>
			<str name="facet.range">manufacturedate_dt</str>
			<str name="f.manufacturedate_dt.facet.range.start">NOW/YEAR-10YEARS</str>
			<str name="f.manufacturedate_dt.facet.range.end">NOW</str>
			<str name="f.manufacturedate_dt.facet.range.gap">+1YEAR</str>
			<str name="f.manufacturedate_dt.facet.range.other">before</str>
			<str name="f.manufacturedate_dt.facet.range.other">after</str>

			<!-- Highlighting defaults -->
			<str name="hl">on</str>
			<str name="hl.fl">content features title name</str>
			<str name="hl.encoder">html</str>
			<str name="hl.simple.pre">&lt;b&gt;</str>
			<str name="hl.simple.post">&lt;/b&gt;</str>
			<str name="f.title.hl.fragsize">0</str>
			<str name="f.title.hl.alternateField">title</str>
			<str name="f.name.hl.fragsize">0</str>
			<str name="f.name.hl.alternateField">name</str>
			<str name="f.content.hl.snippets">3</str>
			<str name="f.content.hl.fragsize">200</str>
			<str name="f.content.hl.alternateField">content</str>
			<str name="f.content.hl.maxAlternateFieldLength">750</str>

			<!-- Spell checking defaults -->
			<str name="spellcheck">on</str>
			<str name="spellcheck.extendedResults">false</str>
			<str name="spellcheck.count">5</str>
			<str name="spellcheck.alternativeTermCount">2</str>
			<str name="spellcheck.maxResultsForSuggest">5</str>
			<str name="spellcheck.collate">true</str>
			<str name="spellcheck.collateExtendedResults">true</str>
			<str name="spellcheck.maxCollationTries">5</str>
			<str name="spellcheck.maxCollations">3</str>
		</lst>

		<!-- append spellchecking to our list of components -->
		<arr name="last-components">
			<str>spellcheck</str>
		</arr>
	</requestHandler>

	<!-- realtime get handler, guaranteed to return the latest stored fields of any document, without the need to commit or open a new searcher. 
		The current implementation relies on the updateLog feature being enabled. -->
	<requestHandler name="/get" class="solr.RealTimeGetHandler">
		<lst name="defaults">
			<str name="omitHeader">true</str>
			<str name="wt">json</str>
			<str name="indent">true</str>
		</lst>
	</requestHandler>

	<requestHandler name="/update" class="solr.UpdateRequestHandler">
	</requestHandler>

	<requestHandler name="/analysis/field" startup="lazy" class="solr.FieldAnalysisRequestHandler" />

	<requestHandler name="/analysis/document" class="solr.DocumentAnalysisRequestHandler" startup="lazy" />

	<requestHandler name="/admin/" class="solr.admin.AdminHandlers" />
	<!-- This single handler is equivalent to the following... -->
	<!-- <requestHandler name="/admin/luke" class="solr.admin.LukeRequestHandler" /> <requestHandler name="/admin/system" class="solr.admin.SystemInfoHandler" 
		/> <requestHandler name="/admin/plugins" class="solr.admin.PluginInfoHandler" /> <requestHandler name="/admin/threads" class="solr.admin.ThreadDumpHandler" 
		/> <requestHandler name="/admin/properties" class="solr.admin.PropertiesRequestHandler" /> <requestHandler name="/admin/file" class="solr.admin.ShowFileRequestHandler" 
		> -->

	<!-- ping/healthcheck -->
	<requestHandler name="/admin/ping" class="solr.PingRequestHandler">
		<lst name="invariants">
			<str name="q">solrpingquery</str>
		</lst>
		<lst name="defaults">
			<str name="echoParams">all</str>
		</lst>
		<!-- An optional feature of the PingRequestHandler is to configure the handler with a "healthcheckFile" which can be used to enable/disable 
			the PingRequestHandler. relative paths are resolved against the data dir -->
		<str name="healthcheckFile">server-enabled.txt</str>
	</requestHandler>

	<!-- Echo the request contents back to the client -->
	<requestHandler name="/debug/dump" class="solr.DumpRequestHandler">
		<lst name="defaults">
			<str name="echoParams">explicit</str>
			<str name="echoHandler">true</str>
		</lst>
	</requestHandler>


	<!-- Replication -->
	<requestHandler name="/replication" class="solr.ReplicationHandler">
		<lst name="master">
			<str name="enable">${solr.master.enable:false}</str>
			<str name="replicateAfter">commit</str>
			<str name="replicateAfter">optimize</str>
			<str name="replicateAfter">startup</str>
			<str name="confFiles">schema.xml</str>
		</lst>
		<lst name="slave">
			<str name="enable">${solr.slave.enable:false}</str>
			<str name="masterUrl">${solr.master.url:http://localhost:8093/solr}</str>
			<str name="pollInterval">00:02:00</str>
		</lst>
	</requestHandler>

	<!-- Solr Replication for SolrCloud Recovery This is the config need for SolrCloud's recovery replication. -->
	<!-- <requestHandler name="/replication" class="solr.ReplicationHandler" startup="lazy" /> -->

	<!-- Spell Check The spell check component can return a list of alternative spelling suggestions. http://wiki.apache.org/solr/SpellCheckComponent -->
	<searchComponent name="spellcheck" class="solr.SpellCheckComponent">

		<str name="queryAnalyzerFieldType">textSpell</str>

		<!-- Multiple "Spell Checkers" can be declared and used by this component -->

		<!-- a spellchecker built from a field of the main index -->
		<lst name="spellchecker">
			<str name="name">default</str>
			<str name="field">name</str>
			<str name="classname">solr.DirectSolrSpellChecker</str>
			<!-- the spellcheck distance measure used, the default is the internal levenshtein -->
			<str name="distanceMeasure">internal</str>
			<!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
			<float name="accuracy">0.5</float>
			<!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
			<int name="maxEdits">2</int>
			<!-- the minimum shared prefix when enumerating terms -->
			<int name="minPrefix">1</int>
			<!-- maximum number of inspections per result. -->
			<int name="maxInspections">5</int>
			<!-- minimum length of a query term to be considered for correction -->
			<int name="minQueryLength">4</int>
			<!-- maximum threshold of documents a query term can appear to be considered for correction -->
			<float name="maxQueryFrequency">0.01</float>
			<!-- uncomment this to require suggestions to occur in 1% of the documents <float name="thresholdTokenFrequency">.01</float> -->
		</lst>

		<!-- a spellchecker that can break or combine words. See "/spell" handler below for usage -->
		<lst name="spellchecker">
			<str name="name">wordbreak</str>
			<str name="classname">solr.WordBreakSolrSpellChecker</str>
			<str name="field">name</str>
			<str name="combineWords">true</str>
			<str name="breakWords">true</str>
			<int name="maxChanges">10</int>
		</lst>

		<!-- a spellchecker that uses a different distance measure -->
		<!-- <lst name="spellchecker"> <str name="name">jarowinkler</str> <str name="field">spell</str> <str name="classname">solr.DirectSolrSpellChecker</str> 
			<str name="distanceMeasure"> org.apache.lucene.search.spell.JaroWinklerDistance </str> </lst> -->

		<!-- a spellchecker that use an alternate comparator comparatorClass be one of: 1. score (default) 2. freq (Frequency first, then score) 3. 
			A fully qualified class name -->
		<!-- <lst name="spellchecker"> <str name="name">freq</str> <str name="field">lowerfilt</str> <str name="classname">solr.DirectSolrSpellChecker</str> 
			<str name="comparatorClass">freq</str> -->

		<!-- A spellchecker that reads the list of words from a file -->
		<!-- <lst name="spellchecker"> <str name="classname">solr.FileBasedSpellChecker</str> <str name="name">file</str> <str name="sourceLocation">spellings.txt</str> 
			<str name="characterEncoding">UTF-8</str> <str name="spellcheckIndexDir">spellcheckerFile</str> </lst> -->
	</searchComponent>

	<!-- A request handler for demonstrating the spellcheck component. NOTE: This is purely as an example. The whole purpose of the SpellCheckComponent 
		is to hook it into the request handler that handles your normal user queries so that a separate request is not needed to get suggestions. IN 
		OTHER WORDS, THERE IS REALLY GOOD CHANCE THE SETUP BELOW IS NOT WHAT YOU WANT FOR YOUR PRODUCTION SYSTEM! See http://wiki.apache.org/solr/SpellCheckComponent 
		for details on the request parameters. -->
	<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
		<lst name="defaults">
			<str name="df">text</str>
			<!-- Solr will use suggestions from both the 'default' spellchecker and from the 'wordbreak' spellchecker and combine them. collations (re-written 
				queries) can include a combination of corrections from both spellcheckers -->
			<str name="spellcheck.dictionary">default</str>
			<str name="spellcheck.dictionary">wordbreak</str>
			<str name="spellcheck">on</str>
			<str name="spellcheck.extendedResults">true</str>
			<str name="spellcheck.count">10</str>
			<str name="spellcheck.alternativeTermCount">5</str>
			<str name="spellcheck.maxResultsForSuggest">5</str>
			<str name="spellcheck.collate">true</str>
			<str name="spellcheck.collateExtendedResults">true</str>
			<str name="spellcheck.maxCollationTries">10</str>
			<str name="spellcheck.maxCollations">5</str>
		</lst>
		<arr name="last-components">
			<str>spellcheck</str>
		</arr>
	</requestHandler>

	<!-- Term Vector Component http://wiki.apache.org/solr/TermVectorComponent -->
	<searchComponent name="tvComponent" class="solr.TermVectorComponent" />

	<!-- A request handler for demonstrating the term vector component This is purely as an example. In reality you will likely want to add the 
		component to your already specified request handlers. -->
	<requestHandler name="/tvrh" class="solr.SearchHandler" startup="lazy">
		<lst name="defaults">
			<str name="df">text</str>
			<bool name="tv">true</bool>
		</lst>
		<arr name="last-components">
			<str>tvComponent</str>
		</arr>
	</requestHandler>

	<!-- Clustering Component http://wiki.apache.org/solr/ClusteringComponent You'll need to set the solr.cluster.enabled system property when running 
		solr to run with clustering enabled: java -Dsolr.clustering.enabled=true -jar start.jar -->
	<searchComponent name="clustering" enable="${solr.clustering.enabled:false}" class="solr.clustering.ClusteringComponent">
		<!-- Declare an engine -->
		<lst name="engine">
			<!-- The name, only one can be named "default" -->
			<str name="name">default</str>

			<!-- Class name of Carrot2 clustering algorithm. Currently available algorithms are: * org.carrot2.clustering.lingo.LingoClusteringAlgorithm 
				* org.carrot2.clustering.stc.STCClusteringAlgorithm * org.carrot2.clustering.kmeans.BisectingKMeansClusteringAlgorithm See http://project.carrot2.org/algorithms.html 
				for the algorithm's characteristics. -->
			<str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>

			<!-- Overriding values for Carrot2 default algorithm attributes. For a description of all available attributes, see: http://download.carrot2.org/stable/manual/#chapter.components. 
				Use attribute key as name attribute of str elements below. These can be further overridden for individual requests by specifying attribute key 
				as request parameter name and attribute value as parameter value. -->
			<str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>

			<!-- Location of Carrot2 lexical resources. A directory from which to load Carrot2-specific stop words and stop labels. Absolute or relative 
				to Solr config directory. If a specific resource (e.g. stopwords.en) is present in the specified dir, it will completely override the corresponding 
				default one that ships with Carrot2. For an overview of Carrot2 lexical resources, see: http://download.carrot2.org/head/manual/#chapter.lexical-resources -->
			<str name="carrot.lexicalResourcesDir">clustering/carrot2</str>

			<!-- The language to assume for the documents. For a list of allowed values, see: http://download.carrot2.org/stable/manual/#section.attribute.lingo.MultilingualClustering.defaultLanguage -->
			<str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
		</lst>
		<lst name="engine">
			<str name="name">stc</str>
			<str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
		</lst>
	</searchComponent>

	<!-- A request handler for demonstrating the clustering component This is purely as an example. In reality you will likely want to add the component 
		to your already specified request handlers. -->
	<requestHandler name="/clustering" startup="lazy" enable="${solr.clustering.enabled:false}" class="solr.SearchHandler">
		<lst name="defaults">
			<bool name="clustering">true</bool>
			<str name="clustering.engine">default</str>
			<bool name="clustering.results">true</bool>
			<!-- The title field -->
			<str name="carrot.title">name</str>
			<str name="carrot.url">id</str>
			<!-- The field to cluster on -->
			<str name="carrot.snippet">features</str>
			<!-- produce summaries -->
			<bool name="carrot.produceSummary">true</bool>
			<!-- the maximum number of labels per cluster -->
			<!--<int name="carrot.numDescriptions">5</int> -->
			<!-- produce sub clusters -->
			<bool name="carrot.outputSubClusters">false</bool>

			<str name="defType">edismax</str>
			<str name="qf">
				text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
			</str>
			<str name="q.alt">*:*</str>
			<str name="rows">10</str>
			<str name="fl">*,score</str>
		</lst>
		<arr name="last-components">
			<str>clustering</str>
		</arr>
	</requestHandler>

	<!-- Terms Component http://wiki.apache.org/solr/TermsComponent A component to return terms and document frequency of those terms -->
	<searchComponent name="terms" class="solr.TermsComponent" />

	<!-- A request handler for demonstrating the terms component -->
	<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
		<lst name="defaults">
			<bool name="terms">true</bool>
		</lst>
		<arr name="components">
			<str>terms</str>
		</arr>
	</requestHandler>


	<!-- Query Elevation Component http://wiki.apache.org/solr/QueryElevationComponent a search component that enables you to configure the top 
		results for a given query regardless of the normal lucene scoring. -->
	<searchComponent name="elevator" class="solr.QueryElevationComponent">
		<!-- pick a fieldType to analyze queries -->
		<str name="queryFieldType">string</str>
		<str name="config-file">elevate.xml</str>
	</searchComponent>

	<!-- A request handler for demonstrating the elevator component -->
	<requestHandler name="/elevate" class="solr.SearchHandler" startup="lazy">
		<lst name="defaults">
			<str name="echoParams">explicit</str>
			<str name="df">text</str>
		</lst>
		<arr name="last-components">
			<str>elevator</str>
		</arr>
	</requestHandler>

	<!-- Highlighting Component http://wiki.apache.org/solr/HighlightingParameters -->
	<searchComponent class="solr.HighlightComponent" name="highlight">
		<highlighting>
			<!-- Configure the standard fragmenter -->
			<!-- This could most likely be commented out in the "default" case -->
			<fragmenter name="gap" default="true" class="solr.highlight.GapFragmenter">
				<lst name="defaults">
					<int name="hl.fragsize">100</int>
				</lst>
			</fragmenter>

			<!-- A regular-expression-based fragmenter (for sentence extraction) -->
			<fragmenter name="regex" class="solr.highlight.RegexFragmenter">
				<lst name="defaults">
					<!-- slightly smaller fragsizes work better because of slop -->
					<int name="hl.fragsize">70</int>
					<!-- allow 50% slop on fragment sizes -->
					<float name="hl.regex.slop">0.5</float>
					<!-- a basic sentence pattern -->
					<str name="hl.regex.pattern">[-\w ,/\n\&quot;&apos;]{20,200}</str>
				</lst>
			</fragmenter>

			<!-- Configure the standard formatter -->
			<formatter name="html" default="true" class="solr.highlight.HtmlFormatter">
				<lst name="defaults">
					<str name="hl.simple.pre"><![CDATA[<em>]]></str>
					<str name="hl.simple.post"><![CDATA[</em>]]></str>
				</lst>
			</formatter>

			<!-- Configure the standard encoder -->
			<encoder name="html" class="solr.highlight.HtmlEncoder" />

			<!-- Configure the standard fragListBuilder -->
			<fragListBuilder name="simple" class="solr.highlight.SimpleFragListBuilder" />

			<!-- Configure the single fragListBuilder -->
			<fragListBuilder name="single" class="solr.highlight.SingleFragListBuilder" />

			<!-- Configure the weighted fragListBuilder -->
			<fragListBuilder name="weighted" default="true" class="solr.highlight.WeightedFragListBuilder" />

			<!-- default tag FragmentsBuilder -->
			<fragmentsBuilder name="default" default="true" class="solr.highlight.ScoreOrderFragmentsBuilder">
				<!-- <lst name="defaults"> <str name="hl.multiValuedSeparatorChar">/</str> </lst> -->
			</fragmentsBuilder>

			<!-- multi-colored tag FragmentsBuilder -->
			<fragmentsBuilder name="colored" class="solr.highlight.ScoreOrderFragmentsBuilder">
				<lst name="defaults">
					<str name="hl.tag.pre"><![CDATA[
               <b style="background:yellow">,<b style="background:lawgreen">,
               <b style="background:aquamarine">,<b style="background:magenta">,
               <b style="background:palegreen">,<b style="background:coral">,
               <b style="background:wheat">,<b style="background:khaki">,
               <b style="background:lime">,<b style="background:deepskyblue">]]></str>
					<str name="hl.tag.post"><![CDATA[</b>]]></str>
				</lst>
			</fragmentsBuilder>

			<boundaryScanner name="default" default="true" class="solr.highlight.SimpleBoundaryScanner">
				<lst name="defaults">
					<str name="hl.bs.maxScan">10</str>
					<str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>
				</lst>
			</boundaryScanner>

			<boundaryScanner name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner">
				<lst name="defaults">
					<!-- type should be one of CHARACTER, WORD(default), LINE and SENTENCE -->
					<str name="hl.bs.type">WORD</str>
					<!-- language and country are used when constructing Locale object. -->
					<!-- And the Locale object will be used when getting instance of BreakIterator -->
					<str name="hl.bs.language">en</str>
					<str name="hl.bs.country">US</str>
				</lst>
			</boundaryScanner>
		</highlighting>
	</searchComponent>

	<!-- The following response writers are implicitly configured unless overridden... -->
	<!-- <queryResponseWriter name="xml" default="true" class="solr.XMLResponseWriter" /> <queryResponseWriter name="json" class="solr.JSONResponseWriter"/> 
		<queryResponseWriter name="python" class="solr.PythonResponseWriter"/> <queryResponseWriter name="ruby" class="solr.RubyResponseWriter"/> <queryResponseWriter 
		name="php" class="solr.PHPResponseWriter"/> <queryResponseWriter name="phps" class="solr.PHPSerializedResponseWriter"/> <queryResponseWriter 
		name="csv" class="solr.CSVResponseWriter"/> -->

	<queryResponseWriter name="json" class="solr.JSONResponseWriter">
		<!-- For the purposes of the tutorial, JSON responses are written as plain text so that they are easy to read in *any* browser. If you expect 
			a MIME type of "application/json" just remove this override. -->
		<str name="content-type">text/plain; charset=UTF-8</str>
	</queryResponseWriter>

	<!-- Custom response writers can be declared as needed... -->
	<queryResponseWriter name="velocity" class="solr.VelocityResponseWriter" startup="lazy" />


	<queryResponseWriter name="xslt" class="solr.XSLTResponseWriter">
		<int name="xsltCacheLifetimeSeconds">5</int>
	</queryResponseWriter>

	<!-- Legacy config for the admin interface -->
	<admin>
		<defaultQuery>*:*</defaultQuery>
	</admin>

</config>

Reply via email to