Ok so in your use case instead of your application(s) writing directly to Kafka you instead have a separate process running that will tail log files and ship them over to Kafka. Is that correct?
On Jun 7, 2013, at 5:33 PM, Jonathan Creasy <j...@box.com> wrote: > I recommend Kafka or Flume-NG for this. > > Our Analytics team is using a Kafka Producer on each server to tail logs > and ship them to Kafka. We use Oozie to schedule a MapReduce consumer every > few minutes to read all the Kafka topics into HDFS. > > We use Kafka as a buffer, we keep a few weeks of data there. Our security > team for example sometimes connects up and consumes some logs for various > purposes. Usually when they want aggregate log data in realtime. > > Most folks access them in HDFS. We have <1 minute of delay for most log > lines getting from the server where they were written to HDFS. > > -Jonathan > > > On Fri, Jun 7, 2013 at 5:30 PM, Mark <static.void....@gmail.com> wrote: > >> Like I said, Im a bit confused. I see the terms "events", "messages" and >> "logs" and not quite sure what to make of it. >> >> We are trying to determine the best way to aggregate all of our logs for >> processing in Hadoop. Kafka seems to fit this bill nicely however I want to >> know If its suited for other types of messages as well. Are there certain >> determine factors on why one would choose Kafka over RabbitMQ? Is it mostly >> scale or is it the type of messages/events/logs being produced/consumed? >> >> On Jun 7, 2013, at 5:21 PM, Alexis Richardson <alexis.richard...@gmail.com> >> wrote: >> >>> On Sat, Jun 8, 2013 at 1:08 AM, Mark <static.void....@gmail.com> wrote: >>>> Im a bit confused on the concept of a "message" in Kafka. How does >> this differ, if at all, from a message in RabbitMQ? It seems to me that >> Kafka is better suited for very write intensive "messages" like log data >> but RabbitMQ may be a better fit for traditional "messages"… i.e. "Product >> Purchased" or "User Registered" message. >>> >>> I'm not sure why you think this, or how to distinguish between a 'log' >>> message and some other kind. >>> >>> Messages = data, annotated with metadata. The latter is typically a >>> protocol-specific envelope. Kafka and Rabbit certainly have different >>> envelopes, eg for mapping data to subscribers/queries. >>> >>> alexis >> >> > > > -- > ** > > *Jonathan Creasy* | Sr. Ops Engineer > > e: j...@box.com | t: 314.580.8909