Another pov you might want to think about - what kind of search you want. Just plain - full text search or there is something more to those text files. Are they grouped in folders? Do the folders imply certain kind of grouping/hierarchy/tagging?
I recently was trying to help somebody who had files across lot of places grouped by date/subject/author - he wanted to ensure these are "fields" which too can act as filters/navigators. Just an input - ignore it if you just want plain full text search. On Sat, Nov 13, 2010 at 11:25 AM, Lance Norskog <goks...@gmail.com> wrote: > About web servers: Solr is a servlet war file and needs a Java web server > "container" to run. The example/ folder in the Solr disribution uses > 'Jetty', and this is fine for small production-quality projects. You can > just copy the example/ directory somewhere to set up your own running Solr; > that's what I always do. > > About indexing programs: if you know Unix scripting, it may be easiest to > walk the file system yourself with the 'find' program and create Solr input > XML files. > > But yes, you definitely want the Solr 1.4 Enterprise manual. I spent months > learning this stuff very slowly, and the book would have been great back > then. > > Lance > > > Erick Erickson wrote: > >> Think of the data import handler (DIH) as Solr pulling data to index >> from some source based on configuration. So, once you set up >> your DIH config to point to your file system, you issue a command >> to solr like "OK, do your data import thing". See the >> FileListEntityProcessor. >> http://wiki.apache.org/solr/DataImportHandler >> >> <http://wiki.apache.org/solr/DataImportHandler>SolrJ is a clent library >> you'd use to push data to Solr. Basically, you >> write a Java program that uses SolrJ to walk the file system, find >> documents, create a Solr document and sent that to Solr. It's not >> nearly as complex as it sounds<G>. See: >> http://wiki.apache.org/solr/Solrj >> >> <http://wiki.apache.org/solr/Solrj>It's probably worth your while to get >> a >> copy of "Solr 1.4, Enterprise Search Server" >> by Erik Pugh and David Smiley. >> >> Best >> Erick >> >> On Fri, Nov 12, 2010 at 8:37 AM, K. Seshadri Iyer<seshadri...@gmail.com >> >wrote: >> >> >> >>> Hi Lance, >>> >>> Thank you very much for responding (not sure how I reply to the group, >>> so, >>> writing to you). >>> >>> Can you please expand on your suggestion? I am not a web guy and so, >>> don't >>> know where to start. >>> >>> What is the difference between SolrJ and DataImportHandler? Do I need to >>> set >>> up web servers on all my storage boxes? >>> >>> Apologies for the basic level of questions, but hope I can get started >>> and >>> implement this before the year end (you know why :o) >>> >>> Thanks, >>> >>> Sesh >>> >>> On 12 November 2010 13:31, Lance Norskog<goks...@gmail.com> wrote: >>> >>> >>> >>>> Using 'curl' is fine. There is a library called SolrJ for Java and >>>> other libraries for other scripting languages that let you upload with >>>> more control. There is a thing in Solr called the DataImportHandler >>>> that lets you script walking a file system. >>>> >>>> On Thu, Nov 11, 2010 at 8:38 PM, K. Seshadri Iyer<seshadri...@gmail.com >>>> >>>> wrote: >>>> >>>> >>>>> Hi, >>>>> >>>>> Pardon me if this sounds very elementary, but I have a very basic >>>>> >>>>> >>>> question >>>> >>>> >>>>> regarding Solr search. I have about 10 storage devices running Solaris >>>>> >>>>> >>>> with >>>> >>>> >>>>> hundreds of thousands of text files (there are other files, as well, >>>>> >>>>> >>>> but >>> >>> >>>> my >>>> >>>> >>>>> target is these text files). The directories on the Solaris boxes are >>>>> exported and are available as NFS mounts. >>>>> >>>>> I have installed Solr 1.4 on a Linux box and have tested the >>>>> >>>>> >>>> installation, >>>> >>>> >>>>> using curl to post documents. However, the manual says that curl is >>>>> >>>>> >>>> not >>> >>> >>>> the >>>> >>>> >>>>> recommended way of posting documents to Solr. Could someone please tell >>>>> >>>>> >>>> me >>>> >>>> >>>>> what is the preferred approach in such an environment? I am not a >>>>> >>>>> >>>> programmer >>>> >>>> >>>>> and would appreciate some hand-holding here :o) >>>>> >>>>> Thanks in advance, >>>>> >>>>> Sesh >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Lance Norskog >>>> goks...@gmail.com >>>> >>>> >>>> >>> >>> >> >> >