[ https://issues.apache.org/jira/browse/SOLR-16697 ]


    Houston Putman deleted comment on SOLR-16697:
    ---------------------------------------

was (Author: jira-bot):
Commit b07d9f6af5907a5e441cdd5dd070afe1cce7d832 in lucene-solr's branch 
refs/heads/8.11.3-apple from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=b07d9f6af59 ]

SOLR-16697: Add API to install per-shard indices

Feature was added upstream by SOLR-16697, starting in 9.3, but was never
backported to the 8.x line upstream.  This commit attempts to make it
available to internal clients using 8.11.2.


> New API support to import index files generated by Embedded SOLR into SOLR 
> Cloud
> --------------------------------------------------------------------------------
>
>                 Key: SOLR-16697
>                 URL: https://issues.apache.org/jira/browse/SOLR-16697
>             Project: Solr
>          Issue Type: New Feature
>          Components: Backup/Restore
>            Reporter: Indumathy Rajagopalan
>            Assignee: Jason Gerlowski
>            Priority: Major
>             Fix For: 9.3
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Offline indexing is a popular option when really large data sets needs to be 
> indexed into SOLR. 
> Data is loaded from data source ( eg. c*)  and index creation pipelines 
> produce index files per shard using embedded SOLR.
>  
> With older versions of SOLR, we would copy these index files into SOLR Cloud 
> data directories using a custom tools and reload the collection to be able to 
> search/update on the newly uploaded collection.
> Ideally, we should use the Restore API to import the index files from backup 
> repository. However, the file structure expected for the Restore API to work 
> is complex enough that massaging the index files in every shard into Restore 
> compatible format is infeasible.
>  
> It would be good for SOLR to support a 'Restore' like API that would allow us 
> to import index files generated by embedded SOLR into SOLR Cloud ? This API 
> should operate on shard level and be able to import the index files into a 
> single shard (per invocation)
>  
> *With the new API , offline indexing could look like this :* 
>  
> 1. Generate index files per shard using embedded SOLR as a part of hadoop MR 
> /Spark jobs  and copy all index files for every shard into backup repository.
>  
> 2. The New API should be able to import the index from backup repository 
> location into each shard on SOLR Cloud. The API would handle things like 
> marking the collection as read-only, trigger replication etc. along the lines 
> of what the 'RESTORE' API currently does.
>  
> The new API should be able to support relevant parameters from Restore API ( 
> location & repository )



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to