[ 
https://issues.apache.org/jira/browse/SOLR-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584298#action_12584298
 ] 

Ezra Epstein commented on SOLR-524:
-----------------------------------

I see that I didn't explain the issue very well.  The situation is that we have 
multiple indices, hence, in 1.2, multiple web-apps.  We also have replication, 
so we need to pull snapshots of the index/data files for each webapp/index.  
The snappuller script has no way to do this.  The rsyncd-start script creates a 
single rsyncd MODULE (not webapp), named "solr".  The snappuller script always 
pulls directly from this modules fixed path - there's no way to extend from 
that module root in the snappuller script.  Thus: if ${data_dir} is 
/opt/solr/data snappuller will pull a snapshot from that folder.  But with 
multiple webapps we'll have:

/opt/solr/webapp1/data
/opt/solr/webapp2/data

and the current solr scripts seem to let us start rsyncd pointing at one folder 
or the other, but not both.  So either we can:
+ start a new instance of rsyncd - though we'd need a different module name 
since [solr] is taken by the first instance - though, I guess we could have it 
listen on a different port, which is potentially confusing (like running 2 
instances of tomcat just to host 2 webapps)
+ not use rsynd and just use rsync directly (via ssh)
+ change rsyncd-start to allow multiple module names: [webapp1], [webapp2], etc 
- ok, but then its hard to add new webapps/indices
+ start rsyncd so that the [solr] module points to a root folder, e.g., 
/opt/solr in the above example, and then allow a variable in snappuller - set 
via the scripts.conf in the slaves - that specifies the path within this single 
module.  Thus we have the first path as /solr/webapp1/data and the second path 
(in the second webapp) as /solr/webapp2/data

More succinctly, I don't see how to use the scripts to support replication of 
multiple indices/webapps.  This approach allows a way that seems to scale and 
work well - with one caveat, the data dirs for the various indices must all be 
under a common root folder (though that could be "/". so it's a minor 
constraint).  So, if not the above, what is the recommended way to replicate 
multiple indices?


> snappuller has limitation w/r/t/ handling multiple web apps
> -----------------------------------------------------------
>
>                 Key: SOLR-524
>                 URL: https://issues.apache.org/jira/browse/SOLR-524
>             Project: Solr
>          Issue Type: Improvement
>          Components: replication
>    Affects Versions: 1.2
>         Environment: Linux (CentOS release 5 (Final))
> Java JDK 6
>            Reporter: Ezra Epstein
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The snappuller has a limitation which makes it hard to use for replicating 
> the indices for multiple webapps.  In particular, by changing:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/ 
> ${data_dir}/${name}-wip
> to: 
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/ 
> ${data_dir}/${name}-wip
> and adding an rsync_module_path variable to scripts.conf, plus giving it a 
> default value of "solr" before the 'unset' commands at the top of the 
> snappuller script, I've worked around the issue.  Still, it seems better to 
> not hard-code the module name ([solr]) and also to allow some flexibility in 
> the location of the data files under that module.  This is req'd for multiple 
> webapps since they won't share a data folder.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to