Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-11-29 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-3591722758

   See https://issues.apache.org/jira/projects/SOLR/issues/SOLR-17507 for when 
we get this in.  Maybe break it up into two, one side for small examples, and 
then in 10.1 or later the full doc?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-10-17 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-3399378334

   In order to not have "forward looking" text in Ref Guide, need #2391 to get 
in...   I am going to take a stab at it tomorrow..  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-10-13 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-3386572359

   Hi all who have contributed to this long lived PR!   With Solr 10 being 
close to being released, I wanted to bend this towards something mergable.   
I've edited the doc down, and there is only one TBD that needs editing before 
this can be merged.   
   
   The doc is narrower than this PR suggests, however I think there is a 
"Extreme Scale' or some such doc that oculd be made that would take in a lot of 
the feedback provided.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-09-19 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-3299639990

   Some good progress..  If https://github.com/apache/solr/pull/2391 happens 
then this is good to go.  If 2391 doesnt' before 10, then I'll edit this and 
then merge it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-08-18 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-3196768222

   For those who haven't seen it, we are now generating diagrams from ascii 
mark up!   I am excited to make it easier to add diagrams to Solr that don't 
require a binary image that then is hard to update.
   
   https://github.com/user-attachments/assets/88a937cf-724a-4dfb-8458-5381523a5da4";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-08-18 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-3196722713

   @tboeghk and @ardatezcan1 I've update this branch to run with the latest 
version of Solr.   My goal is to get this doc in (in one form or another) 
before Solr 10 comes out.  If either of you wants to edit the doc to factor in 
your suggestions, please feel free.  Otherwise I will try and farm your 
comments and add them, but it'll be more from my own personal perspective.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-07-22 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-3104603663

   @tboeghk so I have a new 
[taking-solr-to-production.adoc](https://github.com/apache/solr/pull/2783/files#diff-98db7cee2bd0ff4601379ab3a38bc9d4fea4cd1dfa6a1c0d25a88de6ea622cdb)
 doc, that tries to be a opinonated scaling.  I think that a LOT of what you 
mentioned makes sense at the _Moving Beyond the Basic Cluster_ scaling point... 
  Which I listed as in the six to 12 nodes in your cluster..   I know all "best 
practices" could be done earlier, but I'm trying to frame this as "When you get 
to this size, you need to do this"...   THoughts?  The number of nodes to me, 
while a simplistic measure, is also the easiest to expliain versus query load, 
index load, data load that would be more complex to decide "where am I"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-07-22 Thread via GitHub


tboeghk commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-3104512660

   In addition to the great summary of @ardatezcan1 above here are my practical 
tips and real-world scenarios to run Solr in a high rpm and low to medium 
dataset environment (like ecommerce appliations).
   
   
   __Best practises using Solr in high rpm environments__
   
   Before starting to optimize your Solr setup, make sure to have strong 
observability in place. In addition to the [Solr Prometheus and 
Grafana](https://solr.apache.org/guide/solr/latest/deployment-guide/monitoring-with-prometheus-and-grafana.html)
 setup I strongly recommend setting up the [Node 
Exporter](https://github.com/prometheus/node_exporter) to gather and correlate 
machine metrics.
   
   * __Use Solr in cloud mode__: Running Solr in cloud mode and in a Zookeeper 
ensemble is a prerequisite to the following best-practices. Cloud mode enables 
easy addition and removal of Solr cluster nodes depending on the current 
traffic.
   * __Sharding__: Request processing in Solr is a single threaded operation. 
The larger your dataset the more latency you'll add to request processing. The 
only (sustainable) way to make query processing a multi-threaded operation is 
to shard your index. Depending on your workload, you could simply run multiple 
Solr instances on the same machine. I recommend a single Solr instance per 
machine though.
   * __Sharding strategies__: If your query processing strategy uses 
[collapse (and expand or 
grouping)](https://solr.apache.org/guide/solr/latest/query-guide/collapse-and-expand-results.html),
 make sure to put all documents to a grouping key on the same shard. Adjust the 
[document 
routing](https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-indexing.html#document-routing)
 and `router.field` to your grouping key.
   * __Indexing and optimization strategies__: Indexing into a live collection 
adds significant latency to your search requests. Each commit flushes the 
internal caches and those caches keep Solr running fast. Avoid any unnecessary 
cache flushes!
 * __Optimize your index__: Manually optimizing your index is not 
recommended but delivers the best performance as deleted documents are pruned 
from the index.
 * __Rotate collections__: For smaller to medium datasets it might be a 
good strategy to periodically index your data into a new collection instead of 
updating an existing one. That way, requests caches stay warm for the lifetime 
of a collection and a manual optimize is possible. Use [collection 
aliases](https://solr.apache.org/guide/solr/latest/deployment-guide/aliases.html)
 to switch clients to the new collection.
   * __Use dedicated node setups__: In high traffic environments, a separation 
of concerns gets more important. Use dedicated node types and machine 
sizings/setup for optimal perfomance tailored to the machines role.
 * __Indexer__: Solely used for indexing products. Set up as 
[`TLOG`](https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-shards-indexing.html#types-of-replicas)
 replica type. Must not be used for request processing. Exclude `TLOG` node 
types from request processing using the 
[`shards.preference`](https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html#shards-preference-parameter)
 parameter configured at your request handlers.
 * __Data__: Set up as a `PULL` replica. Replicates it's index from the 
indexer nodes via Solr cloud. Using `TLOG` and `PULL` replicas avoids that 
index data is being pulled off data nodes (as with `NRT` replicas).
 * __Coordinator__: In sharded Solr cloud setups, these nodes coordinate 
the distributed request flow and assemble the final search request result. This 
is a very CPU intensive operation and is usually shared among the data nodes. 
The usage of dedicated [coordinator 
nodes](https://solr.apache.org/guide/solr/latest/deployment-guide/node-roles.html#coordinator-role)
 separates the compute overhead of coordinating distributed requests off of the 
data nodes. Adding coordinator nodes to a Solr cloud setup will drop the 
resource usage on data nodes significantly. To make full use of coordinator 
nodes, direct all incoming request traffic to these nodes.
   * __JVM tuning__: I highly recommend running Solr on _G1GC garbage 
collector_. Keep in mind the golden rule of keeping 50% heap for disk cache on 
data and indexer nodes. As coordinator nodes are stateless, you can boost their 
performance significantly with the _ZGC garbage collector_. It slashes 
collection pauses from milli- to nanoseconds.
   * __Cloud setup__: Most Solr cloud setups will run in some kind of cloud 
environment. Here are some tipps to setup an elastic Solr cloud environment.
 * __Autoscaling__: Use a dedicated autoscaling group for each node type 
and each shard. Use tags to mark which instance should replicate which shard. 
Conf

Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-06-18 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-2983900658

   @tboeghk this is what we talked about in line for lunch!!   Would really 
appreciate your perspective.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-06-02 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-2931971378

   I am kind of waiting for the 10x release cycle to spin up to push this 
along.  There are some things I would change/update in this doc if we get some 
nicer ZK quorum stuff and role stuff done...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-06-01 Thread via GitHub


github-actions[bot] closed pull request #2783: SOLR-17492: Introduce 
recommendations of WAYS of running Solr from small to massive
URL: https://github.com/apache/solr/pull/2783


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-06-01 Thread via GitHub


github-actions[bot] commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-2928154949

   This PR is now closed due to 60 days of inactivity after being marked as 
stale.  Re-opening this PR is still possible, in which case it will be marked 
as active again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-04-01 Thread via GitHub


github-actions[bot] commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-2770938147

   This PR has had no activity for 60 days and is now labeled as stale.  Any 
new activity will remove the stale label.  To attract more reviewers, please 
tag people who might be familiar with the code area and/or notify the 
[email protected] mailing list. To exempt this PR from being marked as 
stale, make it a draft PR or add the label "exempt-stale". If left unattended, 
this PR will be closed after another 60 days of inactivity. Thank you for your 
contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-01-30 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-2624787972

   This remains on my "must do" list for Solr 10, and I will pick it up as we 
get closer ;-).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2025-01-29 Thread via GitHub


github-actions[bot] commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-2623202990

   This PR has had no activity for 60 days and is now labeled as stale.  Any 
new activity will remove the stale label.  To attract more reviewers, please 
tag people who might be familiar with the code area and/or notify the 
[email protected] mailing list. To exempt this PR from being marked as 
stale, make it a draft PR or add the label "exempt-stale". If left unattended, 
this PR will be closed after another 60 days of inactivity. Thank you for your 
contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2024-11-30 Thread via GitHub


ardatezcan1 commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-2509025014

   Whether you're just getting started with Solr or looking to fine-tune an 
existing setup, these practical tips and real-world scenarios may help you get 
the most out of this powerful search platform.
   
   **Best Practices for Using Solr**
   
   **1.Run Solr as a Cluster for Better Performance**
   Solr works best when deployed as a cluster. Start with at least three nodes 
for fault tolerance and scalability, and scale horizontally as your needs grow.
   
   - **Sharding and Replication:** Break your data into shards for parallel 
processing and use replicas for redundancy. A good starting point is two 
replicas per shard, but adjust this based on your workload.
   
   - **Optimize Indexing:** Carefully plan your schema to ensure efficient 
indexing and querying. Use dynamic fields and copy fields where appropriate to 
keep things flexible without overloading your system.
   
   - **Caching for Speed:** Solr provides powerful caching options like query, 
document, and filter caches. Use these for frequently accessed data to speed up 
query times significantly.
   
   - **Tune the JVM:** Since Solr is Java-based, JVM tuning is crucial. Adjust 
heap size to balance memory usage and garbage collection. Monitor GC logs and 
experiment with policies like G1GC or CMS for optimal performance.
   
   **2. Always Use Solr in Cloud Mode**
   For a robust, scalable setup, Solr Cloud Mode is the way to go. This setup 
requires ZooKeeper, which manages cluster coordination, leader election, and 
configuration.
   
   - **ZooKeeper’s Role:** ZooKeeper ensures your Solr cluster runs smoothly by 
handling shard placement, failover, and configuration changes dynamically.
   
   - **Backups and Security:** 
   -Always back up your Solr and ZooKeeper data regularly. Use Solr's built-in 
backup tools or external snapshot mechanisms for safety.
   -Secure your cluster with SSL/TLS, and set up role-based access control, 
ideally with tools like Apache Ranger. If Ranger isn’t an option, manual 
permissions management works too.

   - **Monitoring is Essential:** Keeping an eye on your Solr cluster is 
crucial for ensuring smooth operations. A great place to start is the Solr Web 
UI, which provides a user-friendly interface to monitor metrics like query 
performance, index health, and cache usage. It's easy to use and perfect for 
quickly spotting any issues. For more advanced needs, you may integrate tools 
like Prometheus and Grafana for custom dashboards and alerting. However, I 
should mention that I don’t have direct experience with Prometheus or Grafana 
specifically when working with Solr.
   
   **Using Scenarios: Real-World Applications of Solr**
   **1. Managing Solr for a Large Dataset**
   I used open-source Solr as a search engine for a mobile app. Instead of 
interacting with Solr directly, I managed the setup via ZooKeeper APIs. Here’s 
what that looked like:
   
   - **Cluster Configuration:**
   The cluster handled over 100 TB of data spread across 11 physical machines, 
each running 16 Solr instances.
   - **Sharding and Replication:**
   Data was stored in shards, with each shard having two replicas to ensure 
fault tolerance and load balancing.
   - **Data Storage:**
   Data was stored directly on the local file system, which was a great fit for 
this use case.
   - **Management Approach:**
   Instead of accessing Solr directly, I managed the system via ZooKeeper APIs. 
This approach, even with an embedded ZooKeeper, worked efficiently under heavy 
load.
   
   **2.Using Solr with Cloudera and HDFS**
   Another scenario involved deploying Solr in a Cloudera ecosystem with HDFS 
for storage. Here’s what worked and what didn’t:
   - **Cluster Management:**
   ZooKeeper handled cluster coordination, while Ranger (and previously Sentry) 
managed permissions.
   - **Challenges:**
   Occasionally, node failures caused HDFS file locks, which were difficult to 
resolve without downtime. These required manual fixes and a lot of patience!
   
   If you’ve got questions or need help with something specific, just let me 
know. I’m happy to share more!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2024-10-22 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-2429785037

   First pass in done!  I have put in as `NOTE: ` a number of places where more 
input is needed.   I think this could be a good page to discuss as a group at a 
Community Meetup, make sure we are going in a direction that the community 
supports.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] SOLR-17492: Introduce recommendations of WAYS of running Solr from small to massive [solr]

2024-10-19 Thread via GitHub


epugh commented on PR #2783:
URL: https://github.com/apache/solr/pull/2783#issuecomment-2424053127

   We have diagrams generated in our Markdown!
   https://github.com/user-attachments/assets/3745673f-555d-4726-973a-217d123be640";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]