Another pov you might want to think about - what kind of search you want.
Just plain - full text search or there is something more to those text
files. Are they grouped in folders? Do the folders imply certain kind of
grouping/hierarchy/tagging?

I recently was trying to help somebody who had files across lot of places
grouped by date/subject/author - he wanted to ensure these are "fields"
which too can act as filters/navigators.

Just an input - ignore it if you just want plain full text search.

On Sat, Nov 13, 2010 at 11:25 AM, Lance Norskog <goks...@gmail.com> wrote:

> About web servers: Solr is a servlet war file and needs a Java web server
> "container" to run. The example/ folder in the Solr disribution uses
> 'Jetty', and this is fine for small production-quality projects.  You can
> just copy the example/ directory somewhere to set up your own running Solr;
> that's what I always do.
>
> About indexing programs: if you know Unix scripting, it may be easiest to
> walk the file system yourself with the 'find' program and create Solr input
> XML files.
>
> But yes, you definitely want the Solr 1.4 Enterprise manual. I spent months
> learning this stuff very slowly, and the book would have been great back
> then.
>
> Lance
>
>
> Erick Erickson wrote:
>
>> Think of the data import handler (DIH) as Solr pulling data to index
>> from some source based on configuration. So, once you set up
>> your DIH config to point to your file system, you issue a command
>> to solr like "OK, do your data import thing". See the
>> FileListEntityProcessor.
>> http://wiki.apache.org/solr/DataImportHandler
>>
>> <http://wiki.apache.org/solr/DataImportHandler>SolrJ is a clent library
>> you'd use to push data to Solr. Basically, you
>> write a Java program that uses SolrJ to walk the file system, find
>> documents, create a Solr document and sent that to Solr. It's not
>> nearly as complex as it sounds<G>. See:
>> http://wiki.apache.org/solr/Solrj
>>
>> <http://wiki.apache.org/solr/Solrj>It's probably worth your while to get
>> a
>> copy of "Solr 1.4, Enterprise Search Server"
>> by Erik Pugh and David Smiley.
>>
>> Best
>> Erick
>>
>> On Fri, Nov 12, 2010 at 8:37 AM, K. Seshadri Iyer<seshadri...@gmail.com
>> >wrote:
>>
>>
>>
>>> Hi Lance,
>>>
>>> Thank you very much for responding (not sure how I reply to the group,
>>> so,
>>> writing to you).
>>>
>>> Can you please expand on your suggestion? I am not a web guy and so,
>>> don't
>>> know where to start.
>>>
>>> What is the difference between SolrJ and DataImportHandler? Do I need to
>>> set
>>> up web servers on all my storage boxes?
>>>
>>> Apologies for the basic level of questions, but hope I can get started
>>> and
>>> implement this before the year end (you know why :o)
>>>
>>> Thanks,
>>>
>>> Sesh
>>>
>>> On 12 November 2010 13:31, Lance Norskog<goks...@gmail.com>  wrote:
>>>
>>>
>>>
>>>> Using 'curl' is fine. There is a library called SolrJ for Java and
>>>> other libraries for other scripting languages that let you upload with
>>>> more control. There is a thing in Solr called the DataImportHandler
>>>> that lets you script walking a file system.
>>>>
>>>> On Thu, Nov 11, 2010 at 8:38 PM, K. Seshadri Iyer<seshadri...@gmail.com
>>>>
>>>> wrote:
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> Pardon me if this sounds very elementary, but I have a very basic
>>>>>
>>>>>
>>>> question
>>>>
>>>>
>>>>> regarding Solr search. I have about 10 storage devices running Solaris
>>>>>
>>>>>
>>>> with
>>>>
>>>>
>>>>> hundreds of thousands of text files (there are other files, as well,
>>>>>
>>>>>
>>>> but
>>>
>>>
>>>> my
>>>>
>>>>
>>>>> target is these text files). The directories on the Solaris boxes are
>>>>> exported and are available as NFS mounts.
>>>>>
>>>>> I have installed Solr 1.4 on a Linux box and have tested the
>>>>>
>>>>>
>>>> installation,
>>>>
>>>>
>>>>> using curl to post  documents. However, the manual says that curl is
>>>>>
>>>>>
>>>> not
>>>
>>>
>>>> the
>>>>
>>>>
>>>>> recommended way of posting documents to Solr. Could someone please tell
>>>>>
>>>>>
>>>> me
>>>>
>>>>
>>>>> what is the preferred approach in such an environment? I am not a
>>>>>
>>>>>
>>>> programmer
>>>>
>>>>
>>>>> and would appreciate some hand-holding here :o)
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> Sesh
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Lance Norskog
>>>> goks...@gmail.com
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Reply via email to