Re: Solr on S3FileSystem, Kosmos, GlusterFS, etc….

jay vyas Wed, 25 Jun 2014 10:21:21 -0700

Hi paul.

Im not using it on S3 -- But yes - I dont think S3 would be ideal for Solr
at all.   There are several other Hadoop Compatible File Systems, however,
some of which might be ideal for certain types of SolrCloud workloads.


Anyways... would love to see a Solr wiki page on FileSystem compatibiity,
possibly an entry linking here https://wiki.apache.org/hadoop/HCFS.

In the meantime, I will update this thread if I find anything interesting
when we increase load size.



On Wed, Jun 25, 2014 at 1:34 AM, Paul Libbrecht <p...@hoplahup.net> wrote:

> I've always been under the impression that file-system-access-speed is
> crucial for Lucene-based storage and have always advocated to not use NFS
> for that (for which we had slowness of a factor of 5 approximately). Has
> there any performance measurement made for such a setting? Is FS-caching
> suddenly getting so much better that it is not a problem.
>
> Also, as far as I know S3 bills by the amount of (giga-)bytes exchanged….
> this gives plenty of room but if each starts needs to exchange a big part
> of the index from the storage to the solr server because of cache filling,
> it looks like it won't be that cheap.
>
> thanks for experience report.
>
> paul
>
>
> On 25 juin 2014, at 07:16, Jay Vyas <jayunit100.apa...@gmail.com> wrote:
>
> > Hi Solr !
> >
> > I got this working .  Here's how :
> >
> > With the example jetty runner, you can Extract the tarball, and go to
> the examples/ directory, where you can launch an embedded core. Then, find
> the solrconfig.xml file. Edit it to contain the following xml:
> >
> > <directoryFactory name="DirectoryFactory"
> class="org.apache.solr.core.HdfsDirectoryFactory">
> > <str name="solr.hdfs.home">myhcfs:///solr</str>
> > <str name="solr.hdfs.confdir">/etc/hadoop/conf</str>
> > </directoryFactory>
> >
> > the confdir is important: That is where you will have something like a
> core-site.xml that defines all the parameters for your filesystem
> (fs.defaultFS, fs.mycfs.impl…. and so on).
> >
> >
> > This tells solr, when launched, to use myhcfs as the underlying file
> store.
> >
> > You also should make sure that the jar for your plugin (in our case
> glisters, but hadoop will reference it by looking up the dynamically
> generated parameters that come from the base uri "myhcfs"… classes are on
> the class path, and the hadoop-common jar is also there (Some HCFS shims
> will need FilterFileSystem to run correctly, which is only in
> hadoop-common.jar).
> >
> > So - how to modify the running sold core's class path?
> >
> > To do so – you can update the solrconfig.xml jar directives. There are a
> bunch of regular expression templates you can modify in the
> examples/.../solrconfig.xml file. You can also copy the jars in at runtime,
> to be really safe.
> >
> > Once your example core with gluster configuration is setup, launch it
> with the following properties:
> >
> > java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs
> -Dsolr.data.dir=glusterfs:///solr -Dsolr.updatelog=glusterfs:///solr
> -Dlog4j.configuration=file:/opt/solr-4.4.0-cdh5.0.2/example/etc/logging.properties
> -jar start.jar
> >
> > This starts a basic SOLR server on port 8983.
> >
> > If you are running from the simple jetty based examples which I've used
> to describe this above, then you should see the collection1 core up and
> running, and you should see its index sitting inside the /solr directory of
> your file system.
> >
> > Hope this helps those interested in expanding the use of SolrCloud
> outside of a single FS.
> >
> >
> > On Jun 23, 2014, at 6:16 PM, Jay Vyas <jayunit100.apa...@gmail.com>
> wrote:
> >
> >> Hi folks.  Does anyone deploy solr indices on other HCFS
> implementations (S3FileSystem, for example) regularly ? If so I'm wondering
> >>
> >> 1) Where are the docs for doing this - or examples?  Seems like
> everything, including parameter names for dfs setup, are based around
> "hdfs".   Maybe I should file a JIRA similar to
> https://issues.apache.org/jira/browse/FLUME-2410 (to make the generic
> deployment of SOLR on any file system explicit / obvious).
> >>
> >> 2) if there are any interesting requirements (i.e. createNonRecursive,
> Atomic mkdirs, sharing, blocking expectations etc etc) which need to be
> implemented
> >
>
>


-- 
jay vyas

Re: Solr on S3FileSystem, Kosmos, GlusterFS, etc….

Reply via email to