Re: Solr / HDPSearch related

Cassandra Targett Fri, 10 Nov 2017 12:05:02 -0800

Some of these questions should be directed to Hortonworks, but I'm glad you
posted them here because I noticed you asked similar questions on the IRC
channel but left before I could jump in and help. Full disclosure, I work
for Lucidworks and one of my jobs is managing the development team that
makes HDP Search.


The HDP Search package is an official release of Solr plus some connectors
and development kits that allow you to index content either stored in or
accessed by common Hadoop components (namely, HDFS, Hive, HBase, Spark, and
Storm), and with an Ambari integration for managing Solr. It has ALL of the
features of each Solr release - when we build HDP Search we download Solr
from archive.apache.org and not any kind of clone or forked repo - so if
you can do it with Solr you download outside of that package, you can do it
with the Solr in that package.

To answer your specific questions:

- There are cases where it's more performant to store indexes on the local
filesystem than to use HDFS, but I think it would only be a dramatic
difference if you have a high query rate (others may disagree with this
assessment). As for why someone would do this...if you have 20 half-empty
servers already allocated for HDFS, you probably don't want 5 more for
Solr. You could just use the distributed filesystem you have.

- When the indexes are stored in HDFS, you can absolutely update documents.
In this respect it's really not all that different from a local filesystem.
If you want a bit more information about this, see the Solr Ref Guide:
https://lucene.apache.org/solr/guide/running-solr-on-hdfs.html

- You should discuss the license options with Hortonworks. I believe they
charge separately for HDP Search, but I do not know the details or numbers
(I'm just manage the dev, not the business ;-) ).

- The integration with Ambari doesn't care about where the indexes are -
they can be on HDFS or locally. It's part of the setup of Solr via Ambari
to decide where the indexes will go. I will say the integration with Ambari
isn't very deep - you can monitor the state of Solr on each node (is it
running or not, basically), but besides a few config options you'll still
use the Solr Admin UI for most Solr-related tasks. In the most recent
releases for HDP 2.6, we added alerting in case Solr is down on any node,
and Solr's metrics are stored in the Ambari Metrics System with a few
Grafana dashboards for monitoring load, etc.

Documentation for HDP Search is available at:
https://doc.lucidworks.com/lucidworks-hdpsearch/2.6/index.html if you're
interested in more detail, including screenshots of the Ambari config
options.

Hope this helps clear things up for you -

Cassandra

On Fri, Nov 10, 2017 at 10:08 AM, Greenhorn Techie <
greenhorntec...@gmail.com> wrote:

> Hi,
>
> We have a HDP product cluster and are now planning to build a search
> solution for some of our business requirements. In this regard, I have the
> following questions. Can you please answer the below questions with respect
> to Solr?
>
>    - As I understand, it is more performant to have SolrCloud set-up to use
>    local storage instead of HDFS for storing the indexes. If so, what are
> the
>    use-cases where SolrCloud would store index in HDFS?
>    - Also, if the indexes are stored in HDFS, will it be possible to update
>    the documents stored in Solr in that case?
>    - Will HDP Search be supported as part of HDP support license itself or
>    does it need additional license?
>    - If SolrCloud is configured to use local storage, can it still be
>    managed through Ambari? What aspects of SolrCloud might not be available
>    through Ambari? Monitoring?
>
> Just to provide more context, our data to be indexed is not in HDP at the
> moment and would come from external sources.
>
> Thanks
>

Re: Solr / HDPSearch related

Reply via email to