Re: How should Samza be run on AWS?

2015-08-04 Thread Yi Pan
Hi, Selina, As Gian mentioned, the first thing to set up the real-time stream processing environment is to: a) set up a Kafka cluster; b) set up a YARN cluster. The following links may get you started: https://www.linkedin.com/pulse/20140813032057-89781742-deploy-kafka-cluster-on-aws http://blog.c

How should Samza be run on AWS?

2015-08-04 Thread Job-Selina Wu
Dear All:I was looking for the tutorial how to build and run Samza on AWS and then I found a link below. I am wondering if there is a detail tutorial about how to build Samza on AWS? Sincerely, Selina https://cwiki.apache.org/confluence/display/SAMZA/FAQ#FAQ-HowshouldSamzaberunonAWS? How shou

Re: Review Request 37102: SAMZA-753: BrokerProxy stop should shutdown kafka consumer first

2015-08-04 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37102/#review94146 --- Ship it! This makes sense! Thanks Yan! Did you verify with some tes

Re: What happens after changelog reaches the Kafka retention

2015-08-04 Thread Yan Fang
Aha, ok, that makes sense. Thanks, Yi. Fang, Yan yanfang...@gmail.com On Tue, Aug 4, 2015 at 3:22 PM, Yi Pan wrote: > Hi, Yan, > > The changelog topic should be configured as log-compacted topic, which > means that it will not be deleted due to time-retention. > > -Yi > > On Tue, Aug 4, 2015 at

Review Request 37102: SAMZA-753: BrokerProxy stop should shutdown kafka consumer first

2015-08-04 Thread Yan Fang
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37102/ --- Review request for samza. Bugs: SAMZA-753 https://issues.apache.org/jira/br

Re: Review Request 37069: SAMZA-738 Samza Timer based metrics does not have enough precision

2015-08-04 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37069/#review94122 --- samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala

Re: What happens after changelog reaches the Kafka retention

2015-08-04 Thread Yi Pan
Hi, Yan, The changelog topic should be configured as log-compacted topic, which means that it will not be deleted due to time-retention. -Yi On Tue, Aug 4, 2015 at 3:15 PM, Yan Fang wrote: > Hi guys, > > Have a question about the changelog topic. Currently we are restoring the > kv store by re

What happens after changelog reaches the Kafka retention

2015-08-04 Thread Yan Fang
Hi guys, Have a question about the changelog topic. Currently we are restoring the kv store by reading the whole changelog topic from the Kafka. So what will happen after Kafka deletes some log segment after the retention time? Will the changelog miss some values? Thanks, Fang, Yan yanfang...@gm

Re: Review Request 36903: SAMZA-744: shutdown stores before shutdown producers

2015-08-04 Thread Yi Pan (Data Infrastructure)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/36903/ --- (Updated Aug. 4, 2015, 9:30 p.m.) Review request for samza, Yan Fang, Chinmay S

Re: log4j configuration

2015-08-04 Thread Yan Fang
Hi Jordi, This is a little tricky. :) 1. If you want to specify the logs to specific locations, please use the following two properties, such as *task.opts*=-Dsamza.log.dir=/tmp/samza-logs *yarn.am.opts*=-Dsamza.log.dir=/tmp/samza-master-logs 2. then why doesn't export SAMZA_LOG_DIR work? Beca

Hello Samza Example Issues

2015-08-04 Thread marks1900-post01
I am currently trying to run the Hello Samza example at (http://samza.apache.org/startup/hello-samza/0.9/) and unfortunately when I try to run a Samza job I get the following error: deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --confi

Re: question on commit on changelog

2015-08-04 Thread Yi Pan
Hi, Chen, So, is your goal to improve the throughput to the changelog topic or reduce the size of the changelog topic? If you are targeting for later and your KV-store truly is of the size of the input log, I don't see how it is possible. In a lot of use cases, users will only need to retain the *

Re: question on commit on changelog

2015-08-04 Thread Chen Song
Thanks Yan. Very good explanation on 1). For 2), I understand that users can tune the size of the batch for Kafka producer. However, that doesn't change the number of messages sent to the changelog topic. In our case, we process a high volume log (1.5MM records/second) will update kv store for e

Re: Review Request 37039: SAMZA-748 Coordinator URL always 127.0.0.1

2015-08-04 Thread József Márton Jung
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37039/ --- (Updated Aug. 4, 2015, 12:42 p.m.) Review request for samza. Changes ---

RE: testThreadInterruptInOperationSleep on clean installation

2015-08-04 Thread Jordi Blasi Uribarri
It has compiled perfectly now. Thanks for your help. Jordi -Mensaje original- De: Navina Ramesh [mailto:nram...@linkedin.com.INVALID] Enviado el: martes, 04 de agosto de 2015 9:07 Para: dev@samza.apache.org Asunto: Re: testThreadInterruptInOperationSleep on clean installation Hi Jordi

Re: Review Request 37069: SAMZA-738 Samza Timer based metrics does not have enough precision

2015-08-04 Thread Aleksandar Pejakovic
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37069/ --- (Updated Aug. 4, 2015, 9:18 a.m.) Review request for samza. Changes ---

Review Request 37069: SAMZA-738 Samza Timer based metrics does not have enough precision

2015-08-04 Thread Aleksandar Pejakovic
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/37069/ --- Review request for samza. Repository: samza Description --- Changed Syst

log4j configuration

2015-08-04 Thread Jordi Blasi Uribarri
Hi, I guess this is just a howto question, but I am not able to find how it works. I am trying to trace the code of the job I want to execute in Samza. I have defined the environment variable as stated in the documentation: export SAMZA_LOG_DIR=/opt/logs I believe that this is working as I have

Re: testThreadInterruptInOperationSleep on clean installation

2015-08-04 Thread Navina Ramesh
Hi Jordi, Looks like you are hitting this error: org.apache.samza.storage.kv.TestRocksDbKeyValueStore > testTTL STANDARD_ERROR SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/c