Re: Newbie to SOLR with ridiculously simple questions

Alexandre Rafalovitch Mon, 09 Dec 2013 16:59:57 -0800

Hi Steve,

Good luck. I would start from doing online tutorial if you haven't already
(do it on Windows) and then reading a book. There are several on the
market, including my own for the beginners (
http://blog.outerthoughts.com/2013/06/my-book-on-solr-is-now-published/ ).

For SharePoint, I would look at http://manifoldcf.apache.org/en_US/ , they
seem to be covering that use case specifically and sending information to
Solr.

For more general case, I would look at SolrNet (
https://github.com/mausch/SolrNet/blob/master/Documentation/README.md ). To
use Solr 4 with SorlNet, you would need to get the latest build or build it
yourself from source, it is not terribly complicated.

Tika, is a separate Apache project bundled with Solr and is used to parse
binary files (e.g. PDFs, MSWord, etc) and extract whatever is possible,
usually structured metadata and some sort of internal text.

For the interface, there is a couple of options, though most people are
rolling their own. The main reason is because you should NOT expose Solr
directly to the web (not secure), so there is a need for Solr middleware.
Solr middleware is usually custom with project-specific enhancements, etc.
But you could have a look at Hue for internal/intermediate usage. Hue is
for Hadoop ecosystem, but does include Solr support too:
http://gethue.tumblr.com/tagged/search

The most important point to remember when you are understanding Solr is
that it is there for _search_. You shape your data to match that purpose.
If that breaks relationships and duplicates data in Solr, that's fine. You
still have your primary data safe in relational/document storage.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

On Tue, Dec 10, 2013 at 6:13 AM, smetzger <smetz...@msi-inc.com> wrote:

> OK...
> Im a Windows guy who is being forced to learn SoLR on Ubuntu for the whole
> organizations. I fancy myself somewhat capable of following directions but
> this Solr concept is puzzling.
>
> Here is what I think i know.
>
> Solr houses indexes. Each index record (usually based on a document) need
> to
> be added to the Solr collection.  This seems fairly simple and I can run
> the
> post.jar and various xml and json files  FROM THE UBUNTU TERMINAL. I doubt
> you have to use the Terminal every time you want to add an index.
>
> My guess is that you have to feed Solr from third party systems using the
> http: update url into the solr server. Is this correct? Lets say i have a
> (god forbid) a sharepoint site and I want to move all the document text and
> document metadata into Solr.  Do I simply run a script (say in .NET or
> Coldfusion) that loops through the SP doc records and sends out the http
> update url to Solr for each doc???
>
> How does Tika fit in ?
>
> thanks
> steve
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Newbie-to-SOLR-with-ridiculously-simple-questions-tp4105788.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Newbie to SOLR with ridiculously simple questions

Reply via email to