To follow up it ... it seems dumping to Solr is common ...

http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data

- Jon

On Apr 29, 2010, at 1:58 PM, Jon Baer wrote:

> Good question, +1 on finding answer, my take ...
> 
> Depending on how large of log files you are talking about it might be better 
> off to do this w/ HDFS / Hadoop (and a script language like Pig) (or Amazon 
> EMR)
> 
> http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873
> 
> Theoretically you could split the logs to fields, use a dataimporter and 
> search / sort w/ something like LineEntityProcessor.
> 
> http://wiki.apache.org/solr/DataImportHandler#LineEntityProcessor
> 
> I've tried to use Solr as a log analytics tool (before dataimporthandler) and 
> it was not worth the disk space or practical but I'd love to hear otherwise.  
> In general you could flush daily logs to an index but working w/ the data in 
> another context if you had to seems better fit for HDFS use (I think).
> 
> - Jon
> 
> On Apr 29, 2010, at 1:46 PM, Stefan Maric wrote:
> 
>> 
>> I thought i remembered seeing some information about this, but have been
>> unable to find it
>> 
>> Does anyone know if there is a configuration / module that would allow us to
>> setup Solr to take in the (large) log files generated by our web/app
>> servers, so that we can query for things like peak time requests or most
>> frequently requested web page etc
>> 
>> Thanks
>> Stefan Maric
>> 
> 

Reply via email to