Hi Loganathan,

Seven hours is indeed a long time for indexing! Here’s some related thoughts 
that may help you find a decent solution:

Optimise SQL statements
Given the amount of time involved, I’m curious as to how many records you may 
be indexing, and how complex the underlying SQL queries are. Have you confirmed 
you have database indices on foreign key columns?

Also, (and I’ve just noticed it’s not covered in the documentation) it’s 
possible to shift multi-value attributes into separate queries, which may make 
the overall speed faster (especially if they are creating joins that are only 
necessary for those attributes and not the rest of the index):

# This is only applicable for multi-value attributes:
has taggings.tag_id, :as => :tag_ids, :source => :query

# Also, you can use ranged queries, which may perform
# better depending on the data situation:
has taggings.tag_id, :as => :tag_ids, :source => :ranged_query

But yeah, I generally recommend streamlining the SQL queries as much as 
possible - it reduces the load on your database and speeds up indexing time.

Separate hosts for the database and Sphinx
Speaking of reducing load - do you have Sphinx and your database on the same 
server? If so, is having separate servers an option? This may slow down 
indexing slightly (as you’ve got the extra time for data transferring from one 
machine to the other) but should reduce the load on the database server, which 
is a good thing.

Setting the memory limit for the indexer
http://sphinxsearch.com/docs/current.html#conf-mem-limit 
<http://sphinxsearch.com/docs/current.html#conf-mem-limit>

Sphinx has the ability to cap the amount of memory used by the indexer. This 
may help limit the resources it uses and thus allow you to SSH into the machine 
while it’s running? You can set this per-environment in 
config/thinking_sphinx.yml.

Indexing data separately
This is what you’ve directly asked about. You can certainly run the indexer 
elsewhere and then copy it to the main Sphinx server, but there’s some things 
to note: Firstly, you will still want to have Sphinx reference the same 
production database.

And then, when copying, I would recommend you append .new. into the index file 
names (e.g. article_core.spa becomes article_core.new.spa), then copy the files 
over to your main machine, then send a SIGHUP signal to the daemon. It’ll then 
load the new index data in and replace the old files itself. If you’re running 
the indexer manually, then you can get the correct file names by running it 
with the --rotate flag - but otherwise: all Sphinx index files need to be 
renamed (not just spa files).

And yes, you may end up with slightly out-of-date data, but that’s going to be 
the case no matter what - especially due to the long time indexing requires.

I really hope there are possibilities for SQL optimisation, because this can 
have such a significant impact on indexing times. Otherwise, I hope the rest of 
the information is useful too.

Cheers,

— 
Pat

> On 7 Sep 2017, at 11:10 am, Loganathan Sellappa <loganathan...@gmail.com 
> <mailto:loganathan...@gmail.com>> wrote:
> 
> Hi All,
> 
> Whenever we do the full re-indexing on our database, it is taking around 7 
> hours to complete due to large data sets. Even sometimes it is making our 
> server to go unresponsive and we couldn't able to do the SSH until the 
> indexer completes. 
> 
> I'm looking out for the way to run the indexer outside of our production 
> machine and copy the indexed file to our machine afterward. Just wanted to 
> know whether it'll cause any impact or not and I need clarifications for 
> below question
> 
> During the process of Indexing & copying to the production server, there will 
> be some delta difference will be there between my index file & production 
> server's index file, will it be handled automatically by the indexer or do I 
> need to stop the indexer on production server during the process?
> 
> 
> regards,
> Loganathan
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Thinking Sphinx" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to thinking-sphinx+unsubscr...@googlegroups.com 
> <mailto:thinking-sphinx+unsubscr...@googlegroups.com>.
> To post to this group, send email to thinking-sphinx@googlegroups.com 
> <mailto:thinking-sphinx@googlegroups.com>.
> Visit this group at https://groups.google.com/group/thinking-sphinx 
> <https://groups.google.com/group/thinking-sphinx>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to thinking-sphinx+unsubscr...@googlegroups.com.
To post to this group, send email to thinking-sphinx@googlegroups.com.
Visit this group at https://groups.google.com/group/thinking-sphinx.
For more options, visit https://groups.google.com/d/optout.

Reply via email to