[jira] Commented: (SOLR-215) Multiple Solr Cores

Ryan McKinley (JIRA) Thu, 19 Jul 2007 10:03:34 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513946
 ]


Ryan McKinley commented on SOLR-215:
------------------------------------

 
> My suggestion is that this be added in phase 2, after Henri's initial changes 
> are committed.
> Does this sound reasonable?
> 

Yes - perhaps getting this checked in without touching handlers or the web-app 
side is a good idea.  It is a little weird since the multi-core aspect would 
only be usable programatically, but that will make it possible to easily bat 
around a 'core manager' and http design.

The one big question is what to do with the TokenizerFactory API.  

Yonik, how do you suggest upgrading an interface?  The only clean way I can 
think is to upgrade the TokenizerFactory interface with a 
'MulitCoreTokenizerFactory'  adding an additional argument.  I don't like it, 
but don't know the API compatibility rules well enough to know if it is 
required or is ok to change the API.

----

Will - as is, this patch does not let you dynamically change the core.  They 
are statically defined in web.xml.  This will change.

> Multiple Solr Cores
> -------------------
>
>                 Key: SOLR-215
>                 URL: https://issues.apache.org/jira/browse/SOLR-215
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Henri Biestro
>            Priority: Minor
>         Attachments: solr-215.patch, solr-215.patch, solr-215.patch, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-215.patch.zip, solr-215.patch.zip, solr-215.patch.zip, 
> solr-trunk-533775.patch, solr-trunk-538091.patch, solr-trunk-542847-1.patch, 
> solr-trunk-542847.patch, solr-trunk-src.patch
>
>
> WHAT:
> As of 1.2, Solr only instantiates one SolrCore which handles one Lucene index.
> This patch is intended to allow multiple cores in Solr which also brings 
> multiple indexes capability.
> The patch file to grab is solr-215.patch.zip (see MISC session below).
> WHY:
> The current Solr practical wisdom is that one schema - thus one index - is 
> most likely to accomodate your indexing needs, using a filter to segregate 
> documents if needed. If you really need multiple indexes, deploy multiple web 
> applications.
> There are a some use cases however where having multiple indexes or multiple 
> cores through Solr itself may make sense.
> Multiple cores:
> Deployment issues within some organizations where IT will resist deploying 
> multiple web applications.
> Seamless schema update where you can create a new core and switch to it 
> without starting/stopping servers.
> Embedding Solr in your own application (instead of 'raw' Lucene) and 
> functionally need to segregate schemas & collections.
> Multiple indexes:
> Multiple language collections where each document exists in different 
> languages, analysis being language dependant.
> Having document types that have nothing (or very little) in common with 
> respect to their schema, their lifetime/update frequencies or even collection 
> sizes.
> HOW:
> The best analogy is to consider that instead of deploying multiple 
> web-application, you can have one web-application that hosts more than one 
> Solr core. The patch does not change any of the core logic (nor the core 
> code); each core is configured & behaves exactly as the one core in 1.2; the 
> various caches are per-core & so is the info-bean-registry.
> What the patch does is replace the SolrCore singleton by a collection of 
> cores; all the code modifications are driven by the removal of the different 
> singletons (the config, the schema & the core).
> Each core is 'named' and a static map (keyed by name) allows to easily manage 
> them.
> You declare one servlet filter mapping per core you want to expose in the 
> web.xml; this allows easy to access each core through a different url. 
> USAGE (example web deployment, patch installed):
> Step0
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar solr.xml 
> monitor.ml
> Will index the 2 documents in solr.xml & monitor.xml
> Step1:
> http://localhost:8983/solr/core0/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core0 index; 2 
> documents
> Step2:
> http://localhost:8983/solr/core1/admin/stats.jsp
> Will produce the statistics page from the admin servlet on core1 index; no 
> documents
> Step3:
> java -Durl='http://localhost:8983/solr/core0/update' -jar post.jar ipod*.xml
> java -Durl='http://localhost:8983/solr/core1/update' -jar post.jar mon*.xml
> Adds the ipod*.xml to index of core0 and the mon*.xml to the index of core1;
> running queries from the admin interface, you can verify indexes have 
> different content. 
> USAGE (Java code):
> //create a configuration
> SolrConfig config = new SolrConfig("solrconfig.xml");
> //create a schema
> IndexSchema schema = new IndexSchema(config, "schema0.xml");
> //create a core from the 2 other.
> SolrCore core = new SolrCore("core0", "/path/to/index", config, schema);
> //Accessing a core:
> SolrCore core = SolrCore.getCore("core0"); 
> PATCH MODIFICATIONS DETAILS (per package):
> org.apache.solr.core:
> The heaviest modifications are in SolrCore & SolrConfig.
> SolrCore is the most obvious modification; instead of a singleton, there is a 
> static map of cores keyed by names and assorted methods. To retain some 
> compatibility, the 'null' named core replaces the singleton for the relevant 
> methods, for instance SolrCore.getCore(). One small constraint on the core 
> name is they can't contain '/' or '\' avoiding potential url & file path 
> problems.
> SolrConfig (& SolrIndexConfig) are now used to persist all configuration 
> options that need to be quickly accessible to the various components. Most of 
> these variables were static like those found in SolrIndexSearcher. Mimicking 
> the intent of these static variables, SolrConfig & SolrIndexConfig use public 
> final members to expose them.
> SolrConfig inherits from Config which has been modified; Config is now more 
> strictly a dom document (filled from some resource) and methods to evaluate 
> xpath expressions. Config also continues to be the classloader singleton that 
> allows to easily instantiate classes located in the Solr installation 
> directory.
> org.apache.solr.analysis:
> TokenizerFactory & FilterFactory now get the SolrConfig passed as a parameter 
> to init; one might want to read some resources to initialize the factory and 
> the config dir is in the config. This is partially redundant with the 
> argument map though.
> org.apache.solr.handler:
> RequestHandlerBase takes the core as a constructor parameter.
> org.apache.solr.util:
> The test harness has been modified to expose the core it instantiates.
> org.apache.solr.servlet:
> SolrDispatchFilter is now instantiating a core configured at init time; the 
> web.xml must contain one filter declaration and one filter mapping per core 
> you want to expose.  Wherever some admin or servlet or page was referring to 
> the SolrCore singleton or SolrConfig, they now check for the request 
> attribute 'org.apache.solr.SolrCore' first; the filters set this attribute 
> before forwarding to the other parts.
> Admin/servlet:
> Has been modified to use the core exposed through the request attribute 
> 'org.apache.solr.SolrCore'.
> REPLICATION:
> The feature has not been implemented yet; the starting point is that instead 
> of having just one index directory 'index/', the naming scheme for the index 
> data directories is 'index*/'. Have to investigate. 
> FUTURE:
> Uploading new schema/conf would be nice, allowing Solr to create cores 
> dynamically; the upload mechanism itself is easy, the servlet dispatch filter 
> needs to be modified.
> Having replication embedded in the Solr application itself using an http 
> based version of the rsync algorithm; some of the core code of jarsync might 
> be handy.
> MISC:
> The patch production process (not as easy as I thought it was with a 
> Windows/Netbeans/cygwin/TortoiseSVN).
> 0/ Initial point is to have the modified code running in a local patch 
> branch, all tests ok.
> 1/ Have one 'clean version' of the trunk aside the local patch branch; you'll 
> need to verify that your patch can be applied to the last clean trunk version 
> and that various tests still work from there. Creating the patch is key.
> 2/ If you used some IDE and forgot to set the auto-indentation corrrectly, 
> you most likely need working around the space/indentation patch clutter that 
> results. I could not find a way to get TortoiseSVN create a patch with the 
> proper options (ignore spaces & al) and could not find a way to get 
> NetbeansSVN generate one either. Thus I create the patch from the local trunk 
> root through cygwin (with svn+patchutils); svn diff --diff-cmd /usr/bin/diff 
> -x "-w -B -b -E -d -N -u" > ~/solr-215.patch.
> Before generating the patch, it is important to issue an 'svn add ...' for 
> each file you might have added; a quick "svn status | grep '?'" allows to 
> verify nothing will be missing. Not elegant, but you can even follow with: 
> svn status | grep '?' | awk '{print $2}' | xargs svn add
> 3/ Apply the patch to the 'clean trunk'.
> TortoiseSVN 'apply patch' command only understands 'unified diff' thus the 
> '-u' option.
> Alternatively, you can apply the patch through cygwin: patch -p0 -u < 
> solr-215.patch.
> I've updated the 'dev' environment to an x86 Solaris 10 box which now 
> generates the zipped patch( solr-215.patch.zip , same patch production 
> method).
> For Solaris 10 users, patch must be "gnu" patch: /usr/local/bin/patch is its 
> usual location (not to be confused with /bin/patch...)
> For x86, you can find it at 
> ftp://ftp.sunfreeware.com/pub/freeware/intel/10/patch-2.5.4-sol10-x86-local.gz
>  ; I don't know about diff but I'm using the version located at 
> ftp://ftp.sunfreeware.com/pub/freeware/intel/10/diffutils-2.8.1-sol10-intel-local.gz

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-215) Multiple Solr Cores

Reply via email to