[
https://issues.apache.org/jira/browse/SOLR-524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12584298#action_12584298
]
Ezra Epstein commented on SOLR-524:
-----------------------------------
I see that I didn't explain the issue very well. The situation is that we have
multiple indices, hence, in 1.2, multiple web-apps. We also have replication,
so we need to pull snapshots of the index/data files for each webapp/index.
The snappuller script has no way to do this. The rsyncd-start script creates a
single rsyncd MODULE (not webapp), named "solr". The snappuller script always
pulls directly from this modules fixed path - there's no way to extend from
that module root in the snappuller script. Thus: if ${data_dir} is
/opt/solr/data snappuller will pull a snapshot from that folder. But with
multiple webapps we'll have:
/opt/solr/webapp1/data
/opt/solr/webapp2/data
and the current solr scripts seem to let us start rsyncd pointing at one folder
or the other, but not both. So either we can:
+ start a new instance of rsyncd - though we'd need a different module name
since [solr] is taken by the first instance - though, I guess we could have it
listen on a different port, which is potentially confusing (like running 2
instances of tomcat just to host 2 webapps)
+ not use rsynd and just use rsync directly (via ssh)
+ change rsyncd-start to allow multiple module names: [webapp1], [webapp2], etc
- ok, but then its hard to add new webapps/indices
+ start rsyncd so that the [solr] module points to a root folder, e.g.,
/opt/solr in the above example, and then allow a variable in snappuller - set
via the scripts.conf in the slaves - that specifies the path within this single
module. Thus we have the first path as /solr/webapp1/data and the second path
(in the second webapp) as /solr/webapp2/data
More succinctly, I don't see how to use the scripts to support replication of
multiple indices/webapps. This approach allows a way that seems to scale and
work well - with one caveat, the data dirs for the various indices must all be
under a common root folder (though that could be "/". so it's a minor
constraint). So, if not the above, what is the recommended way to replicate
multiple indices?
> snappuller has limitation w/r/t/ handling multiple web apps
> -----------------------------------------------------------
>
> Key: SOLR-524
> URL: https://issues.apache.org/jira/browse/SOLR-524
> Project: Solr
> Issue Type: Improvement
> Components: replication
> Affects Versions: 1.2
> Environment: Linux (CentOS release 5 (Final))
> Java JDK 6
> Reporter: Ezra Epstein
> Priority: Minor
> Original Estimate: 0.5h
> Remaining Estimate: 0.5h
>
> The snappuller has a limitation which makes it hard to use for replicating
> the indices for multiple webapps. In particular, by changing:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/solr/${name}/
> ${data_dir}/${name}-wip
> to:
> # rsync over files that have changed
> rsync -Wa${verbose}${compress} --delete ${sizeonly} \
> ${stats} rsync://${master_host}:${rsyncd_port}/${rsync_module_path}/${name}/
> ${data_dir}/${name}-wip
> and adding an rsync_module_path variable to scripts.conf, plus giving it a
> default value of "solr" before the 'unset' commands at the top of the
> snappuller script, I've worked around the issue. Still, it seems better to
> not hard-code the module name ([solr]) and also to allow some flexibility in
> the location of the data files under that module. This is req'd for multiple
> webapps since they won't share a data folder.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.