[jira] Updated: (SOLR-215) Multiple Solr Cores

Henri Biestro (JIRA) Fri, 08 Jun 2007 06:07:48 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Henri Biestro updated SOLR-215:
-------------------------------

    Description: 
What
-------
As of Solr 1.2, Solr only instantiates one SolrCore which handles one Lucene 
index. This patch is intended to allow multiple cores in Solr which also brings 
multiple indexes capability.

Why
------
The current Solr practical wisdom is that one schema - thus one index - is most 
likely to accomodate your indexing needs, using a filter to segregate documents 
when needed. If you believe you need multiple indexes, deploy multiple web 
applications.
There are a some use cases however where having multiple indexes or multiple 
cores through Solr itself may make sense.
Multiple cores:
Deployment issues within some organizations where IT will resist deploying 
multiple web applications.
Seamless schema update where you can create a new core and switch to it without 
starting/stopping servers.
Embedding Solr in your own application (instead of 'raw' Lucene) and you 
functionally need to segregate schemas & collections.
Multiple indexes:
Multiple language collections where each document exists in different 
languages, analysis being language dependant.
Having document types that have nothing (or very little) in common with respect 
to their schema, their lifetime/update frequencies or even collection sizes.
Some background on the 'whys':
http://www.nabble.com/Multiple-Solr-Cores-tf3608399.html#a10082201
http://www.nabble.com/Embedding-Solr-vs-Lucene%2C-multiple-Solr-cores--tf3572324.html#a9981355

How
------
The best analogy is to consider that instead of deploying multiple 
web-application, you can have one web-application that hosts more than one Solr 
core. The patch does not change any of the core logic (nor the core code); each 
core is configured & behaves exactly as the one core in 1.2; the various caches 
are per-core & so is the info-bean-registry.
What the patch does is replace the SolrCore singleton by a collection of cores; 
all the code modifications are driven by the removal of the different 
singletons (the config, the schema & the core).
Each core is 'named' and a static map (keyed by name) allows to easily manage 
them.
You declare one servlet filter mapping per core you want to expose in the 
web.xml; this allows easy to access each core through a different url. 

Details (per package)
-----------------------------
org.apache.solr.core:
The heaviest modifications are in SolrCore & SolrConfig.
SolrCore is the most obvious modification; instead of a singleton, there is a 
static map of cores keyed by names and assorted methods. To retain some 
compatibility, the 'null' named core replaces the singleton for the relevant 
methods, for instance SolrCore.getCore(). One small constraint on the core name 
is they can't contain '/' or '\' avoiding potential url & file path problems.
SolrConfig (& SolrIndexConfig) are now used to persist all configuration 
options that need to be quickly accessible to the various components. Most of 
these variables were static like those found in SolrIndexSearcher. Mimicking 
the intent of these static variables, SolrConfig & SolrIndexConfig use public 
final members to expose them.
SolrConfig inherits from Config which has been modified; Config is now more 
strictly a dom document (filled from some resource) and methods to evaluate 
xpath expressions. Config also continues to be the classloader singleton that 
allows to easily instantiate classes located in the Solr installation directory.

org.apache.solr.analysis:
TokenizerFactory & FilterFactory now get the SolrConfig passed as a parameter 
to init; one might want to read some resources to initialize the factory and 
the config dir is in the config. This is partially redundant with the argument 
map though.

org.apache.solr.handler:
RequestHandlerBase takes the core as a constructor parameter.

org.apache.solr.util:
The test harness has been modified to expose the core it instantiates.

org.apache.solr.servlet:
SolrDispatchFilter is now instantiating a core configured at init time; the 
web.xml must contain one filter declaration and one filter mapping per core you 
want to expose.  Wherever some admin or servlet or page was referring to the 
SolrCore singleton or SolrConfig, they now check for the request attribute 
'org.apache.solr.SolrCore' first; the filters set this attribute before 
forwarding to the other parts.

Admin/servlet:
Has been modified to use the core exposed through the request attribute 
'org.apache.solr.SolrCore'.

Replication
----------------
The feature has not been implemented yet; the starting point is that instead of 
having just one index directory 'index/', the naming scheme for the index data 
directories is 'index*/'. Have to investigate. 

Future
---------
Uploading new schema/conf would be nice, allowing Solr to create cores 
dynamically; besides the upload mechanism itself which should be easy, the 
servlet filter would have to be modified.
Having replication embedded in the Solr application itself using an http based 
version of the rsync algorithm; some of the core code of jarsync might be handy.

Misc
-------
The patch production process (not as easy as I thought it was with a 
Windows/Netbeans/cygwin/TortoiseSVN).
0/ Initial point is to have the modified code running in a local patch branch, 
all tests ok.
1/ Have one 'clean version' of the trunk aside the local patch branch; you'll 
need to verify that your patch can be applied to the last clean trunk version 
and that various tests still work from there. Creating the patch is key.
2/ If you used some IDE and forgot to set the auto-indentation corrrectly, you 
most likely need working around the space/indentation patch clutter that 
results. I could not find a way to get TortoiseSVN create a patch with the 
proper options (ignore spaces & al) and could not find a way to get NetbeansSVN 
generate one either. Thus I create the patch from the local trunk root through 
cygwin (with svn+patchutils); svn diff --diff-cmd /usr/bin/diff -x "-w -B -b -E 
-d -N -u" > ~/solr-215.patch.
Before generating the patch, it is important to issue an 'svn add ...' for each 
file you might have added; a quick "svn status | grep '?'" allows to verify 
nothing will be missing.
3/ Apply the patch to the 'clean trunk'.
You can apply the patch through cygwin: patch -p0 -u < solr-215.patch.
Alternatively, TortoiseSVN 'apply patch' command since the patch format is 
'unified diff'.


  was:
Allow multiple cores in one web-application (or one class-loader):
This allows to have multiple cores created from different config & schema in 
the same application.
The side effect is that this also allows different indexes.

Implementation notes for the patch:
The patch allows to have multiple 'named' cores in the same application.
The current single core behavior has been retained  - the core named 'null' - 
but code could not be kept 100% compatible. (In particular, Solrconfig.config 
is gone; SolrCore.getCore() is still here though).

A few classes were only existing as singletons and have thus been refactored.
The Config class feature-set has been narrowed to class loading relative to the 
installation (lib) directory;
The SolrConfig class feature-set has evolved towards the 'solr config' part, 
caching frequently accessed parameters;
The IndexSchema class uses a SolrConfig instance; there are a few parameters in 
the configuration that pertain to indexing that were needed.
The SolrCore is built from a SolrConfig & an IndexSchema.

The creation of a core has become:
//create a configuration
SolrConfig config = SolrConfig.createConfiguration("solrconfig.xml");
//create a schema
IndexSchema schema = new IndexSchema(config, "schema0.xml");
//create a core from the 2 other.
SolrCore core = new SolrCore("core0", "/path/to/index", config, schema);
//Accessing a core:
SolrCore core = SolrCore.getCore("core0");


There are few other changes mainly related to passing through constructors the 
SolrCore/SolrConfig used.

Some background on the 'whys':
http://www.nabble.com/Multiple-Solr-Cores-tf3608399.html#a10082201
http://www.nabble.com/Embedding-Solr-vs-Lucene%2C-multiple-Solr-cores--tf3572324.html#a9981355


Patch can now be installed on a clean trunk.

> Multiple Solr Cores
> -------------------
>
>                 Key: SOLR-215
>                 URL: https://issues.apache.org/jira/browse/SOLR-215
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Henri Biestro
>            Priority: Minor
>         Attachments: solr-215.patch, solr-215.patch, solr-trunk-533775.patch, 
> solr-trunk-538091.patch, solr-trunk-542847-1.patch, solr-trunk-542847.patch, 
> solr-trunk-src.patch
>
>
> What
> -------
> As of Solr 1.2, Solr only instantiates one SolrCore which handles one Lucene 
> index. This patch is intended to allow multiple cores in Solr which also 
> brings multiple indexes capability.
> Why
> ------
> The current Solr practical wisdom is that one schema - thus one index - is 
> most likely to accomodate your indexing needs, using a filter to segregate 
> documents when needed. If you believe you need multiple indexes, deploy 
> multiple web applications.
> There are a some use cases however where having multiple indexes or multiple 
> cores through Solr itself may make sense.
> Multiple cores:
> Deployment issues within some organizations where IT will resist deploying 
> multiple web applications.
> Seamless schema update where you can create a new core and switch to it 
> without starting/stopping servers.
> Embedding Solr in your own application (instead of 'raw' Lucene) and you 
> functionally need to segregate schemas & collections.
> Multiple indexes:
> Multiple language collections where each document exists in different 
> languages, analysis being language dependant.
> Having document types that have nothing (or very little) in common with 
> respect to their schema, their lifetime/update frequencies or even collection 
> sizes.
> Some background on the 'whys':
> http://www.nabble.com/Multiple-Solr-Cores-tf3608399.html#a10082201
> http://www.nabble.com/Embedding-Solr-vs-Lucene%2C-multiple-Solr-cores--tf3572324.html#a9981355
> How
> ------
> The best analogy is to consider that instead of deploying multiple 
> web-application, you can have one web-application that hosts more than one 
> Solr core. The patch does not change any of the core logic (nor the core 
> code); each core is configured & behaves exactly as the one core in 1.2; the 
> various caches are per-core & so is the info-bean-registry.
> What the patch does is replace the SolrCore singleton by a collection of 
> cores; all the code modifications are driven by the removal of the different 
> singletons (the config, the schema & the core).
> Each core is 'named' and a static map (keyed by name) allows to easily manage 
> them.
> You declare one servlet filter mapping per core you want to expose in the 
> web.xml; this allows easy to access each core through a different url. 
> Details (per package)
> -----------------------------
> org.apache.solr.core:
> The heaviest modifications are in SolrCore & SolrConfig.
> SolrCore is the most obvious modification; instead of a singleton, there is a 
> static map of cores keyed by names and assorted methods. To retain some 
> compatibility, the 'null' named core replaces the singleton for the relevant 
> methods, for instance SolrCore.getCore(). One small constraint on the core 
> name is they can't contain '/' or '\' avoiding potential url & file path 
> problems.
> SolrConfig (& SolrIndexConfig) are now used to persist all configuration 
> options that need to be quickly accessible to the various components. Most of 
> these variables were static like those found in SolrIndexSearcher. Mimicking 
> the intent of these static variables, SolrConfig & SolrIndexConfig use public 
> final members to expose them.
> SolrConfig inherits from Config which has been modified; Config is now more 
> strictly a dom document (filled from some resource) and methods to evaluate 
> xpath expressions. Config also continues to be the classloader singleton that 
> allows to easily instantiate classes located in the Solr installation 
> directory.
> org.apache.solr.analysis:
> TokenizerFactory & FilterFactory now get the SolrConfig passed as a parameter 
> to init; one might want to read some resources to initialize the factory and 
> the config dir is in the config. This is partially redundant with the 
> argument map though.
> org.apache.solr.handler:
> RequestHandlerBase takes the core as a constructor parameter.
> org.apache.solr.util:
> The test harness has been modified to expose the core it instantiates.
> org.apache.solr.servlet:
> SolrDispatchFilter is now instantiating a core configured at init time; the 
> web.xml must contain one filter declaration and one filter mapping per core 
> you want to expose.  Wherever some admin or servlet or page was referring to 
> the SolrCore singleton or SolrConfig, they now check for the request 
> attribute 'org.apache.solr.SolrCore' first; the filters set this attribute 
> before forwarding to the other parts.
> Admin/servlet:
> Has been modified to use the core exposed through the request attribute 
> 'org.apache.solr.SolrCore'.
> Replication
> ----------------
> The feature has not been implemented yet; the starting point is that instead 
> of having just one index directory 'index/', the naming scheme for the index 
> data directories is 'index*/'. Have to investigate. 
> Future
> ---------
> Uploading new schema/conf would be nice, allowing Solr to create cores 
> dynamically; besides the upload mechanism itself which should be easy, the 
> servlet filter would have to be modified.
> Having replication embedded in the Solr application itself using an http 
> based version of the rsync algorithm; some of the core code of jarsync might 
> be handy.
> Misc
> -------
> The patch production process (not as easy as I thought it was with a 
> Windows/Netbeans/cygwin/TortoiseSVN).
> 0/ Initial point is to have the modified code running in a local patch 
> branch, all tests ok.
> 1/ Have one 'clean version' of the trunk aside the local patch branch; you'll 
> need to verify that your patch can be applied to the last clean trunk version 
> and that various tests still work from there. Creating the patch is key.
> 2/ If you used some IDE and forgot to set the auto-indentation corrrectly, 
> you most likely need working around the space/indentation patch clutter that 
> results. I could not find a way to get TortoiseSVN create a patch with the 
> proper options (ignore spaces & al) and could not find a way to get 
> NetbeansSVN generate one either. Thus I create the patch from the local trunk 
> root through cygwin (with svn+patchutils); svn diff --diff-cmd /usr/bin/diff 
> -x "-w -B -b -E -d -N -u" > ~/solr-215.patch.
> Before generating the patch, it is important to issue an 'svn add ...' for 
> each file you might have added; a quick "svn status | grep '?'" allows to 
> verify nothing will be missing.
> 3/ Apply the patch to the 'clean trunk'.
> You can apply the patch through cygwin: patch -p0 -u < solr-215.patch.
> Alternatively, TortoiseSVN 'apply patch' command since the patch format is 
> 'unified diff'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-215) Multiple Solr Cores

Reply via email to