Hey, Your system sounds similar to the work don by Stu Hood at Rackspace in their Mailtrust unit. See http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-datafor more details and inspiration.
Regards, Jeff On Thu, Jun 4, 2009 at 4:58 PM, <silentsurfe...@yahoo.com> wrote: > Hi, > This is encouraging to know that solr/lucene solution may work. > Can anyone using solr/lucene for such scenario can confirm that the > solution is used and working fine? That would be really helpful, as I just > started looking into the solr/lucene solution only couple of days back and > might be difficult to be 100% confident before proposing the solution > approach in next couple of days. > Thanks,Surfer > > --- On Thu, 6/4/09, Otis Gospodnetic <otis_gospodne...@yahoo.com> wrote: > > From: Otis Gospodnetic <otis_gospodne...@yahoo.com> > Subject: Re: Questions regarding IT search solution > To: > solr-user@lucene.apache.org > Date: Thursday, June 4, 2009, 10:26 PM > > > My guess is Solr/Lucene would work. Not sure how well/fast, but it would, > esp. if you avoid range queries (or use tdate), and esp. if you > shard/segment indices smartly, so that at query time you send (or distribute > if you have to) the query to only those shards that have the data (if your > query is for a limited time period). > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > ----- Original Message ---- > > From: Silent Surfer <silentsurfe...@yahoo.com> > > To: solr-user@lucene.apache.org > > Sent: Thursday, June 4, 2009 5:52:21 PM > > Subject: Re: > Questions regarding IT search solution > > > > Hi, > > As Alex correctly pointed out my main intention is to figure out whether > > Solr/lucene offer functionalities to replicate what Splunk is doing in > terms of > > building indexes etc for enabling search capabilities. > > We evaluated Splunk, but it is not very cost effective solution for us as > we may > > have logs running into few GBs per day as there can be around 25-20 > servers > > running, and Splunk licensing model is based of size of logs per day that > too, > > the license valid for only 1 year. > > With this back ground, any further inputs on this are greatly > appreciated. > > Thanks,Surfer > > > > --- On Thu, 6/4/09, Alexandre Rafalovitch wrote: > > > > From: Alexandre Rafalovitch > > Subject: Re: Questions regarding IT search solution > > To: solr-user@lucene.apache.org > > Date: Thursday, June 4, 2009, 9:27 PM > > > > I would also be interested to know what other existing solutions exist. > > > > Splunk's advantage is that it does extraction of the fields with > > advanced searching functionality (it has lexers/parsers for multiple > > content types). I believe that's the Solr's function desired in > > original posting. At the time they came out (2004), I was not aware of > > any good open source solutions to do what they did. And I would have > > loved one, as I was analyzing multi-gigabite logs. > > > > Hadoop might be a way to process the files, but what would do the > > indexing and searching? > > > > Regards, > > Alex. > > > > On Thu, Jun 4, 2009 at 11:56 AM, Walter Underwoodwrote: > > > Why build one? Don't those already exist? > > > > > > Personally, I'd start with Hadoop instead of Solr. Putting > logs in a > > > search index is guaranteed to not scale. People were already trying > > > different approaches ten years ago. > > > > > > wunder > > > > > > On 6/4/09 8:41 AM, "Silent Surfer" wrote: > > > > > >> Hi, > > >> Any help/pointers on the following message would really help me.. > > >> Thanks,Surfer > > >> > > >> --- On Tue, 6/2/09, Silent Surfer wrote: > > >> > > >> From: Silent Surfer > > >> Subject: Questions regarding IT search solution > > >> To: solr-user@lucene.apache.org > > >> Date: Tuesday, June 2, 2009, 5:45 PM > > >> > > >> Hi, > > >> I am new to Lucene forum and it is my first question.I need a > clarification > > >> from you. > > >> Requirement:------------------1. Build a IT search tool for logs > similar to > > >> that of Splunk(Only wrt searching logs but not in terms of reporting, > graphs > > >> etc) using > solr/lucene. The log files are mainly the server logs like JBoss, > > >> Custom application server logs (May or may not be log4j logs) and the > files > > >> size can go potentially upto 100 MB2. The logs are spread across > multiple > > >> servers (25 to 30 servers)2. Capability to be do search almost > realtime3. > > >> Support distributed search > > >> > > >> Our search criterion can be based on a keyword or timestamp or IP > address > > etc. > > >> Can anyone throw some light if solr/lucene is right solution for this > ? > > >> Appreciate any quick help in this regard. > > >> Thanks,Surfer > > > > > >