Re: Wrong FS: hdfs://localhost:9000/user/root/.slider/cluster/c100, expected: file:/// Issues deploying memcached using slider.

2014-10-29 Thread hsy...@gmail.com
@Pushkar Thanks, but it doesn't work for me. My slider-client.xml setting is property nameyarn.application.classpath/name

Re: Wrong FS: hdfs://localhost:9000/user/root/.slider/cluster/c100, expected: file:/// Issues deploying memcached using slider.

2014-10-29 Thread hsy...@gmail.com
conf dir as is to yarn classpath variable. E.g. property nameyarn.application.classpath/name value*/etc/hadoop/conf*,/usr/hdp/current/hadoop-client/* On Wed, Oct 29, 2014 at 11:18 AM, hsy...@gmail.com hsy...@gmail.com wrote: @Pushkar Thanks, but it doesn't work for me

Re: Wrong FS: hdfs://localhost:9000/user/root/.slider/cluster/c100, expected: file:/// Issues deploying memcached using slider.

2014-10-29 Thread hsy...@gmail.com
: hdfs://localhost:9000/user/siyuan/.slider/cluster/cl15, expected: file:/// Looks like there is no log4j log at all. How do I properly setup log4j? Best, Siyuan On Wed, Oct 29, 2014 at 11:47 AM, hsy...@gmail.com hsy...@gmail.com wrote: Hi, I installed the apache hadoop with just unzip

Re: Wrong FS: hdfs://localhost:9000/user/root/.slider/cluster/c100, expected: file:/// Issues deploying memcached using slider.

2014-10-29 Thread hsy...@gmail.com
Sorry, my classpath should be $HADOOP_HOME/etc/hadoop/, thanks for you guys' help! On Wed, Oct 29, 2014 at 12:06 PM, hsy...@gmail.com hsy...@gmail.com wrote: And all I have in slider-err.txt is log4j:WARN No appenders could be found for logger

NoClassDefFoundError ? Hadoop classpath

2014-10-28 Thread hsy...@gmail.com
Hi guys, I'm new to slider. I tried to run a java application from slider and get the error. Hadoop classpath has been setup in client.xml Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/client/api/async/AMRMClientAsync$CallbackHandler Do I have to include the

Re: Wrong FS: hdfs://localhost:9000/user/root/.slider/cluster/c100, expected: file:/// Issues deploying memcached using slider.

2014-10-28 Thread hsy...@gmail.com
I try to run kafka as slider application On Tue, Oct 28, 2014 at 7:01 PM, hsy...@gmail.com hsy...@gmail.com wrote: I had the same problem This is my appConfig.json { schema: http://example.org/specification/v2.0.0;, metadata: { }, global: { application.def: hdfs

Re: Wrong FS: hdfs://localhost:9000/user/root/.slider/cluster/c100, expected: file:/// Issues deploying memcached using slider.

2014-10-28 Thread hsy...@gmail.com
I had the same problem This is my appConfig.json { schema: http://example.org/specification/v2.0.0;, metadata: { }, global: { application.def: hdfs://localhost:9000/user/siyuan/slider_kafka.zip, java_home: /usr/lib/jvm/java-7-oracle/, package_list:

Create topic programmatically

2014-10-13 Thread hsy...@gmail.com
Hi guys, Besides TopicCommand, which I believe is not provided to create topic programmatically, is there any other way to automate creating topic in code? Thanks! Best, Siyuan

Is there a way to run application on certain subset of nodes?

2014-08-04 Thread hsy...@gmail.com
Hi guys, I'm new to slider and try to convert some application into yarn app. I would like to ask is there a way to specify only a subset of nodes in the cluster to run my app and can slider guarantee every container(of that application) run on different nodes? Thank you very much! Best, Siyuan

Re: Is there a way to run application on certain subset of nodes?

2014-08-04 Thread hsy...@gmail.com
, 2014 at 2:37 PM, hsy...@gmail.com hsy...@gmail.com wrote: Hi guys, I'm new to slider and try to convert some application into yarn app. I would like to ask is there a way to specify only a subset of nodes in the cluster to run my app and can slider guarantee every container

Kafka on yarn

2014-07-23 Thread hsy...@gmail.com
Hi guys, Kafka is getting more and more popular and in most cases people run kafka as long-term service in the cluster. Is there a discussion of running kafka on yarn cluster which we can utilize the convenient configuration/resource management and HA. I think there is a big potential and

Re: Kafka on yarn

2014-07-23 Thread hsy...@gmail.com
, hsy...@gmail.com hsy...@gmail.com wrote: Hi guys, Kafka is getting more and more popular and in most cases people run kafka as long-term service in the cluster. Is there a discussion of running kafka on yarn cluster which we can utilize the convenient configuration/resource

Re: How to do an interactive Spark SQL

2014-07-23 Thread hsy...@gmail.com
Anyone has any idea on this? On Tue, Jul 22, 2014 at 7:02 PM, hsy...@gmail.com hsy...@gmail.com wrote: But how do they do the interactive sql in the demo? https://www.youtube.com/watch?v=dJQ5lV5Tldw And if it can work in the local mode. I think it should be able to work in cluster mode

How to do an interactive Spark SQL

2014-07-22 Thread hsy...@gmail.com
Hi guys, I'm able to run some Spark SQL example but the sql is static in the code. I would like to know is there a way to read sql from somewhere else (shell for example) I could read sql statement from kafka/zookeeper, but I cannot share the sql to all workers. broadcast seems not working for

Re: How to do an interactive Spark SQL

2014-07-22 Thread hsy...@gmail.com
in the code? What do you mean by cannot shar the sql to all workers? On Tue, Jul 22, 2014 at 4:03 PM, hsy...@gmail.com hsy...@gmail.com wrote: Hi guys, I'm able to run some Spark SQL example but the sql is static in the code. I would like to know is there a way to read sql from somewhere

Re: How to do an interactive Spark SQL

2014-07-22 Thread hsy...@gmail.com
)) }) ssc.start() ssc.awaitTermination() On Tue, Jul 22, 2014 at 5:10 PM, Zongheng Yang zonghen...@gmail.com wrote: Can you paste a small code example to illustrate your questions? On Tue, Jul 22, 2014 at 5:05 PM, hsy...@gmail.com hsy...@gmail.com wrote: Sorry, typo. What I mean

Re: How to do an interactive Spark SQL

2014-07-22 Thread hsy...@gmail.com
after the StreamingContext has started. Tobias On Wed, Jul 23, 2014 at 9:55 AM, hsy...@gmail.com hsy...@gmail.com wrote: For example, this is what I tested and work on local mode, what it does is it get data and sql query both from kafka and do sql on each RDD and output the result back

Re: Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-07-21 Thread hsy...@gmail.com
I have the same problem On Sat, Jul 19, 2014 at 12:31 AM, lihu lihu...@gmail.com wrote: Hi, Everyone. I have a piece of following code. When I run it, it occurred the error just like below, it seem that the SparkContext is not serializable, but i do not try to use the SparkContext

Re: Difference among batchDuration, windowDuration, slideDuration

2014-07-17 Thread hsy...@gmail.com
Thanks Tathagata, so can I say RDD size(from the stream) is window size. and the overlap between 2 adjacent RDDs are sliding size. But I still don't understand what it batch size, why do we need this since data processing is RDD by RDD right? And does spark chop the data into RDDs at the very

Re: Interested in contributing to Kafka?

2014-07-16 Thread hsy...@gmail.com
Hi Jay, I would like to take a look at the code base and maybe start working on some jiras. Best, Siyuan On Wed, Jul 16, 2014 at 3:09 PM, Jay Kreps jay.kr...@gmail.com wrote: Hey All, A number of people have been submitting really nice patches recently. If you are interested in

Re: Interested in contributing to Kafka?

2014-07-16 Thread hsy...@gmail.com
Is there a scala API doc for the entire kafka library? On Wed, Jul 16, 2014 at 5:34 PM, hsy...@gmail.com hsy...@gmail.com wrote: Hi Jay, I would like to take a look at the code base and maybe start working on some jiras. Best, Siyuan On Wed, Jul 16, 2014 at 3:09 PM, Jay Kreps jay.kr

Difference among batchDuration, windowDuration, slideDuration

2014-07-16 Thread hsy...@gmail.com
When I'm reading the API of spark streaming, I'm confused by the 3 different durations StreamingContext(conf: SparkConf http://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkConf.html , batchDuration: Duration

Re: How to kill running spark yarn application

2014-07-15 Thread hsy...@gmail.com
reproduce it. On Mon, Jul 14, 2014 at 7:36 PM, hsy...@gmail.com hsy...@gmail.com wrote: Before yarn application -kill If you do jps You'll have a list of SparkSubmit and ApplicationMaster After you use yarn applicaton -kill you only kill the SparkSubmit On Mon, Jul 14, 2014 at 4:29 PM

Re: SQL + streaming

2014-07-15 Thread hsy...@gmail.com
sure it works, and you see output? Also, I recommend going through the previous step-by-step approach to narrow down where the problem is. TD On Mon, Jul 14, 2014 at 9:15 PM, hsy...@gmail.com hsy...@gmail.com wrote: Actually, I deployed this on yarn cluster(spark-submit) and I couldn't find

Re: SQL + streaming

2014-07-15 Thread hsy...@gmail.com
anything in the driver logs! So try doing a collect, or take on the RDD returned by sql query and print that. TD On Tue, Jul 15, 2014 at 4:28 PM, hsy...@gmail.com hsy...@gmail.com wrote: By the way, have you ever run SQL and stream together? Do you know any example that works? Thanks! On Tue

How to kill running spark yarn application

2014-07-14 Thread hsy...@gmail.com
Hi all, A newbie question, I start a spark yarn application through spark-submit How do I kill this app. I can kill the yarn app by yarn application -kill appid but the application master is still running. What's the proper way to shutdown the entire app? Best, Siyuan

SQL + streaming

2014-07-14 Thread hsy...@gmail.com
Hi All, Couple days ago, I tried to integrate SQL and streaming together. My understanding is I can transform RDD from Dstream to schemaRDD and execute SQL on each RDD. But I got no luck Would you guys help me take a look at my code? Thank you very much! object KafkaSpark { def main(args:

Re: How to kill running spark yarn application

2014-07-14 Thread hsy...@gmail.com
. This is what I did 2 hours ago. Sorry I cannot provide more help. Sent from my iPhone On 14 Jul, 2014, at 6:05 pm, hsy...@gmail.com hsy...@gmail.com wrote: yarn-cluster On Mon, Jul 14, 2014 at 2:44 PM, Jerry Lam chiling...@gmail.com wrote: Hi Siyuan, I wonder if you --master yarn-cluster

Re: SQL + streaming

2014-07-14 Thread hsy...@gmail.com
but SQL command throwing error? No errors but no output either? TD On Mon, Jul 14, 2014 at 4:06 PM, hsy...@gmail.com hsy...@gmail.com wrote: Hi All, Couple days ago, I tried to integrate SQL and streaming together. My understanding is I can transform RDD from Dstream to schemaRDD and execute

Re: SQL + streaming

2014-07-14 Thread hsy...@gmail.com
that to work, then I would test the Spark SQL stuff. TD On Mon, Jul 14, 2014 at 5:25 PM, hsy...@gmail.com hsy...@gmail.com wrote: No errors but no output either... Thanks! On Mon, Jul 14, 2014 at 4:59 PM, Tathagata Das tathagata.das1...@gmail.com wrote: Could you elaborate on what

Re: Some question about SQL and streaming

2014-07-10 Thread hsy...@gmail.com
:21 AM, hsy...@gmail.com hsy...@gmail.com wrote: Hi guys, I'm a new user to spark. I would like to know is there an example of how to user spark SQL and spark streaming together? My use case is I want to do some SQL on the input stream from kafka. Thanks! Best, Siyuan

Difference between SparkSQL and shark

2014-07-10 Thread hsy...@gmail.com
I have a newbie question. What is the difference between SparkSQL and Shark? Best, Siyuan

Re: Too Many Open Files Broker Error

2014-07-09 Thread hsy...@gmail.com
I have the same problem. I didn't dig deeper but I saw this happen when I launch kafka in daemon mode. I found the daemon mode is just launch kafka with nohup. Not quite clear why this happen. On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul pl...@ebay.com wrote: Yup. In fact, I just ran the test

Some question about SQL and streaming

2014-07-09 Thread hsy...@gmail.com
Hi guys, I'm a new user to spark. I would like to know is there an example of how to user spark SQL and spark streaming together? My use case is I want to do some SQL on the input stream from kafka. Thanks! Best, Siyuan

Re: Help is processing huge data through Kafka-storm cluster

2014-06-19 Thread hsy...@gmail.com
-to-topic node on Yarn. On Jun 17, 2014, at 10:44 AM, hsy...@gmail.com wrote: Hi Shaikh, I heard some throughput bottleneck of storm. It cannot really scale up with kafka. I recommend you to try DataTorrent platform( https://www.datatorrent.com/ ) The platform itself

Re: delete topic ?

2014-06-18 Thread hsy...@gmail.com
I'm using 0.8.1.1 I use DeleteTopicCommand to delete topic args[0] = --topic; args[1] = the topic you want to delete args[2] = --zookeeper; args[3] = kafkaZookeepers; DeleteTopicCommand.main(args); You can write your own script to delete the topic, I guess. And I think it only

Re: Help is processing huge data through Kafka-storm cluster

2014-06-17 Thread hsy...@gmail.com
Hi Shaikh, I heard some throughput bottleneck of storm. It cannot really scale up with kafka. I recommend you to try DataTorrent platform(https://www.datatorrent.com/) The platform itself is not open-source but it has a open-source library ( https://github.com/DataTorrent/Malhar) which contains

Async producer callback?

2014-05-20 Thread hsy...@gmail.com
Hi guys, So far, is there a way to track the asyn producer callback. My requirement is basically if all nodes of the topic goes down, can I pause the producer and after the broker comes back online, continue to produce from the failure point? Best, Siyuan

Is there a way to delete partition at runtime?

2013-12-05 Thread hsy...@gmail.com
Hi guys, I found there is a tool to add partition on the fly. My question is, is there a way to delete a partition at runtime? Thanks! Best, Siyuan

kafka_2.8.0/0.8.0 pom seems invalid

2013-12-04 Thread hsy...@gmail.com
Hi All, I was trying to upgrade the kafka to 0.8 but I get an empty jar file for dependency groupIdorg.apache.kafka/groupId artifactIdkafka_2.8.0/artifactId version0.8.0/version /dependency However dependency groupIdorg.apache.kafka/groupId artifactIdkafka_2.8.2/artifactId

Re: kafka_2.8.0/0.8.0 pom seems invalid

2013-12-04 Thread hsy...@gmail.com
/ On Wed, Dec 4, 2013 at 4:48 PM, hsy...@gmail.com hsy...@gmail.com wrote: Hi All, I was trying to upgrade the kafka to 0.8 but I get an empty jar file for dependency groupIdorg.apache.kafka/groupId artifactIdkafka_2.8.0/artifactId version0.8.0/version

Re: Consuming from a replica

2013-11-27 Thread hsy...@gmail.com
What I did for my project is I have a thread send metadata request to a random broker and monitor the metadata change periodically. The good thing is, to my knowledge, any broker in the cluster know the metadata for all the topics served in this cluster. Another options is you can always query

Re: Producer reaches a max of 7Mbps

2013-11-19 Thread hsy...@gmail.com
I think the max 50Mbps is almost the disk bottleneck My guess is IO is the bottle neck for kafka if you set to same type(async without ack) I got throughput at about 30Mb Try to increase if you don't care about latency very much log.flush.interval.messages=1 log.flush.interval.ms=3000 On

Re: will this cause message loss?

2013-11-14 Thread hsy...@gmail.com
Also if you use HEAD, you can create more partitions at runtime, you just need dynamic partitioner class I think On Thu, Nov 14, 2013 at 7:23 AM, Neha Narkhede neha.narkh...@gmail.comwrote: There is no way to delete topics in Kafka yet. You can add partitions to existing topics, but you may

High-level consumer load-balancing problem

2013-11-14 Thread hsy...@gmail.com
Hi, I have questions about the load balancing of kafka high-level consumer Suppose I have 4 partition And the producer throughput to these 4 partitions are like this 01 23 10MB/s 10MB/s 1MB/s1MB/s 1kMsg/s,

Re: Kafka cluster with lots of topics

2013-11-13 Thread hsy...@gmail.com
I didn't see any auto leader election for adding nodes. The data are still skewed on the old nodes. You have to force it by running script? On Wed, Nov 13, 2013 at 6:41 AM, Neha Narkhede neha.narkh...@gmail.comwrote: At those many topics, zookeeper will be the main bottleneck. Leader election

Re: pom warning

2013-11-13 Thread hsy...@gmail.com
LLC http://www.stealth.ly Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop / On Tue, Nov 12, 2013 at 2:56 PM, hsy...@gmail.com hsy...@gmail.com wrote: Hi guys, When I built my project using maven I got WARNING [WARNING

A problem of fault-tolerant high-level consumer group

2013-11-13 Thread hsy...@gmail.com
I'm working on some fault-tolerant consumer group. The idea is this, to maximize the throughput of kafka. I request the metadata from broker and create #{num of partition} consumers for each topic and distribute them on different nodes. Moreover, there is mechanism to detect fail of any node and

pom warning

2013-11-12 Thread hsy...@gmail.com
Hi guys, When I built my project using maven I got WARNING [WARNING] The POM for org.apache.kafka:kafka_2.8.0:jar:0.8.0-beta1 is invalid, transitive dependencies (if any) will not be available: 1 problem was encountered while building the effective model And I looked at the

Detail description of metrcs value?

2013-11-11 Thread hsy...@gmail.com
Hi guys, Is there a detail document about the attributes and objectnames about the mbeans? For example, what does attribute MeanRate of Object MessagesPerSec mean? It's the mean value of last 1 sec/1min ? http://kafka.apache.org/documentation.html#monitoring only have a little information about

Is there a way to add partition to a particular topic

2013-11-08 Thread hsy...@gmail.com
Hi guys, since kafka is able to add new broker into the cluster at runtime, I'm wondering is there a way to add new partition for a specific topic at run time? If not what will you do if you want to add more partition to a topic? Thanks!

Re: Is there a way to add partition to a particular topic

2013-11-08 Thread hsy...@gmail.com
-partition tool: https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-5.AddPartitionTool Guozhang On Fri, Nov 8, 2013 at 5:32 PM, hsy...@gmail.com hsy...@gmail.com wrote: Hi guys, since kafka is able to add new broker into the cluster at runtime, I'm wondering

Re: Is there a way to add partition to a particular topic

2013-11-08 Thread hsy...@gmail.com
I mean I assume the messages not yet consumed before delete-topic will be delivered before you create same topic, correct? On Fri, Nov 8, 2013 at 6:30 PM, hsy...@gmail.com hsy...@gmail.com wrote: It's in the branch, cool, I'll wait for it's release. actually I find I can use ./kafka-delete

Throughput Questions

2013-10-31 Thread hsy...@gmail.com
Hi guys, I have some throughput questions. I try to test the throughput using both the High Level Consumer and Simple Consumer example from the document. But I get much lower throughput of simple consumer than the high level consumer. I run the test in the cluster and I'm sure I distribute the

Re: partition reassignment

2013-10-16 Thread hsy...@gmail.com
There is a ticket for auto-rebalancing, hopefully they'll do auto redistribution soon https://issues.apache.org/jira/browse/KAFKA-930 On Wed, Oct 16, 2013 at 12:29 AM, Kane Kane kane.ist...@gmail.com wrote: Yes, thanks, looks like that's what i need, do you know why it tends to choose the

KafkaStream bug?

2013-10-14 Thread hsy...@gmail.com
I found some weird behavior, I follow the exact code example for HighlevelConsumer https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example# but add one debug line here public void run() { ConsumerIteratorbyte[], byte[] it = m_stream.iterator(); while

Is there a programmatic way to create topic

2013-10-14 Thread hsy...@gmail.com
Hi kafka, Is there a programmatic way to create topic. http://stackoverflow.com/questions/16946778/how-can-we-create-a-topic-in-kafka-from-the-ide-using-api/18480684#18480684 is too hacky, plus it's not a sync function. I'm asking this because I'm writing a test case which will start kafka

Re: Is there a programmatic way to create topic

2013-10-14 Thread hsy...@gmail.com
CreateTopicCommand.createTopic(). This is probably something we can improve in the forthcoming releases. Thanks, Neha On Mon, Oct 14, 2013 at 3:02 PM, hsy...@gmail.com hsy...@gmail.com wrote: Hi kafka, Is there a programmatic way to create topic. http://stackoverflow.com/questions/16946778/how-can-we

Question about auto-rebalancing

2013-10-11 Thread hsy...@gmail.com
Hi guys, Here is a case I observed, I have a single-node 3 broker instance cluster. I created 1 topic with 2 partitions and 2 replica for each partition. The initial distribution is like this topic1/partition0 -(broker0, broker2) topic1/partition1 -(broker1,broker2). So broker0 is leader broker

Re: Question about auto-rebalancing

2013-10-11 Thread hsy...@gmail.com
Hi Jun, Thanks for your reply, but in a real cluster, one broker could serve different topics and different partitions, the simple consumer only has knowledge of brokers that are available but it has no knowledge to decide which broker is best to pick up to consume messages. If you don't choose

Re: Is there a way to pull out kafka metadata from zookeeper?

2013-10-11 Thread hsy...@gmail.com
of a single TopicMetadataRequest roundtrip to some kafka broker. Thanks, Neha On Fri, Oct 11, 2013 at 11:30 AM, hsy...@gmail.com hsy...@gmail.com wrote: Thanks guys! But I feel weird. Assume I have 20 brokers for 10 different topics with 2 partitions and 2 replicas for each. For each consumer

Re: Question about auto-rebalancing

2013-10-11 Thread hsy...@gmail.com
-CanIpredictthere sultsoftheconsumerrebabalance%3F Guozhang On Fri, Oct 11, 2013 at 11:06 AM, hsy...@gmail.com hsy...@gmail.com wrote: Hi Jun, Thanks for your reply, but in a real cluster, one broker could serve different topics and different partitions, the simple consumer only has

Is there a way to pull out kafka metadata from zookeeper?

2013-10-10 Thread hsy...@gmail.com
Hi guys, I'm trying to maintain a bunch of simple kafka consumer to consume messages from brokers. I know there is a way to send TopicMetadataRequest to broker and get the response from the broker. But you have to specify the broker list to query the information. But broker might not be available

<    1   2