Thank you, I didn't know it. 
I have been looking for some benchmarks joni vs java (defauld package), do you 
know some web with results? Anyway, I'll try for myself tomorrow. 

----- Mensaje original -----

De: "Ted Yu" <yuzhih...@gmail.com> 
Para: "common-u...@hadoop.apache.org" <user@hadoop.apache.org> 
Enviados: Domingo, 5 de Octubre 2014 22:32:27 
Asunto: Re: InputFormat for dealing with log files. 

Regex processing is not that slow - when adopting best practices. 

This project provides better performance compared to that of Java's: 
https://github.com/jruby/joni 

Cheers 

On Sun, Oct 5, 2014 at 1:18 PM, Guillermo Ortiz < gor...@pragsis.com > wrote: 



I thought something like that,, but I guess it should be a little more complex 
because it should look for a pattern, maybe a date format? An idea it's if you 
know that the first 10 digits are the date, you could get them and try to match 
with a date format or something more generic like a RE, although it seems too 
expensive in time process and the operations in the InputFormat should be 
pretty fast. 

Any better idea? 


De: "Ted Yu" < yuzhih...@gmail.com > 
Para: " common-u...@hadoop.apache.org " < user@hadoop.apache.org > 
Enviados: Domingo, 5 de Octubre 2014 16:27:18 
Asunto: Re: InputFormat for dealing with log files. 

Have you read http://blog.rguha.net/?p=293 ? 

Cheers 

On Sun, Oct 5, 2014 at 6:24 AM, Guillermo Ortiz < gor...@pragsis.com > wrote: 

<blockquote>

I'd like to know if there's an InputFormat to be able to deal with log files. 
The problem that I have it's that if I have to read an Tomcat log for example, 
sometimes the exceptions are typed on several lines, but they should be 
processed just like one line, I mean all the lines together to the map. 
Is there something like that implemented? I've been looking for, but I don't 
find anything and I don't want to reinvent the wheel. 
AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo 
es privada y confidencial y va dirigida exclusivamente a su destinatario. 
Pragsis informa a quien pueda haber recibido este correo por error que contiene 
información confidencial cuyo uso, copia, reproducción o distribución está 
expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este 
correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su 
eliminación sin copiarlo, imprimirlo o utilizarlo de ningún 
modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in 
or attached to it are private and confidential and intended exclusively for the 
addressee. Pragsis informs to whom it may receive it in error that it contains 
privileged information and its use, copy, reproduction or distribution is 
prohibited. If you are not an intended recipient of this E-mail, please notify 
the sender, delete it and do not read, act upon, print, disclose, copy, reta 
in or redistribute any portion of this E-mail. 






AVISO CONFIDENCIAL 
Este correo y la información contenida o adjunta al mismo es privada y 
confidencial y va dirigida exclusivamente a su destinatario. Pragsis informa a 
quien pueda haber recibido este correo por error que contiene información 
confidencial cuyo uso, copia, reproducción o distribución está expresamente 
prohibida. Si no es Vd. el destinatario del mismo y recibe este correo por 
error, le rogamos lo ponga en conocimiento del emisor y proceda a su 
eliminación sin copiarlo, imprimirlo o utilizarlo de ningún modo. 
CONFIDENTIALITY WARNING. 
This message and the information contained in or attached to it are private and 
confidential and intended exclusively for the addressee. Pragsis informs to 
whom it may receive it in error that it contains privileged information and its 
use, copy, reproduction or distribution is prohibited. If you are not an 
intended recipient of this E-mail, please notify the sender, delete it and do 
not read, act upon, print, disclose, copy, retain or redistribute any portion 
of this E-mail. 

</blockquote>




AVISO CONFIDENCIAL\nEste correo y la información contenida o adjunta al mismo 
es privada y confidencial y va dirigida exclusivamente a su destinatario. 
Pragsis informa a quien pueda haber recibido este correo por error que contiene 
información confidencial cuyo uso, copia, reproducción o distribución está 
expresamente prohibida. Si no es Vd. el destinatario del mismo y recibe este 
correo por error, le rogamos lo ponga en conocimiento del emisor y proceda a su 
eliminación sin copiarlo, imprimirlo o utilizarlo de ningún 
modo.\nCONFIDENTIALITY WARNING.\nThis message and the information contained in 
or attached to it are private and confidential and intended exclusively for the 
addressee. Pragsis informs to whom it may receive it in error that it contains 
privileged information and its use, copy, reproduction or distribution is 
prohibited. If you are not an intended recipient of this E-mail, please notify 
the sender, delete it and do not read, act upon,
  print, disclose, copy, retain or redistribute any portion of this E-mail.

Reply via email to