@Pushkar
Thanks, but it doesn't work for me.
My slider-client.xml setting is
property
nameyarn.application.classpath/name
conf dir as is to yarn classpath variable.
E.g.
property
nameyarn.application.classpath/name
value*/etc/hadoop/conf*,/usr/hdp/current/hadoop-client/*
On Wed, Oct 29, 2014 at 11:18 AM, hsy...@gmail.com hsy...@gmail.com
wrote:
@Pushkar
Thanks, but it doesn't work for me
:
hdfs://localhost:9000/user/siyuan/.slider/cluster/cl15, expected:
file:///
Looks like there is no log4j log at all.
How do I properly setup log4j?
Best,
Siyuan
On Wed, Oct 29, 2014 at 11:47 AM, hsy...@gmail.com hsy...@gmail.com wrote:
Hi,
I installed the apache hadoop with just unzip
Sorry, my classpath should be $HADOOP_HOME/etc/hadoop/, thanks for you
guys' help!
On Wed, Oct 29, 2014 at 12:06 PM, hsy...@gmail.com hsy...@gmail.com wrote:
And all I have in slider-err.txt is
log4j:WARN No appenders could be found for logger
Hi guys,
I'm new to slider. I tried to run a java application from slider and get
the error.
Hadoop classpath has been setup in client.xml
Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/yarn/client/api/async/AMRMClientAsync$CallbackHandler
Do I have to include the
I try to run kafka as slider application
On Tue, Oct 28, 2014 at 7:01 PM, hsy...@gmail.com hsy...@gmail.com wrote:
I had the same problem
This is my appConfig.json
{
schema: http://example.org/specification/v2.0.0;,
metadata: {
},
global: {
application.def:
hdfs
I had the same problem
This is my appConfig.json
{
schema: http://example.org/specification/v2.0.0;,
metadata: {
},
global: {
application.def: hdfs://localhost:9000/user/siyuan/slider_kafka.zip,
java_home: /usr/lib/jvm/java-7-oracle/,
package_list:
Hi guys,
Besides TopicCommand, which I believe is not provided to create topic
programmatically, is there any other way to automate creating topic in
code? Thanks!
Best,
Siyuan
Hi guys,
I'm new to slider and try to convert some application into yarn app. I
would like to ask is there a way to specify only a subset of nodes in the
cluster to run my app and can slider guarantee every container(of that
application) run on different nodes?
Thank you very much!
Best,
Siyuan
, 2014 at 2:37 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Hi guys,
I'm new to slider and try to convert some application into yarn app. I
would like to ask is there a way to specify only a subset of nodes in
the
cluster to run my app and can slider guarantee every container
Hi guys,
Kafka is getting more and more popular and in most cases people run kafka
as long-term service in the cluster. Is there a discussion of running kafka
on yarn cluster which we can utilize the convenient configuration/resource
management and HA. I think there is a big potential and
, hsy...@gmail.com
hsy...@gmail.com wrote:
Hi guys,
Kafka is getting more and more popular and in most cases people run kafka
as long-term service in the cluster. Is there a discussion of running
kafka
on yarn cluster which we can utilize the convenient
configuration/resource
Anyone has any idea on this?
On Tue, Jul 22, 2014 at 7:02 PM, hsy...@gmail.com hsy...@gmail.com wrote:
But how do they do the interactive sql in the demo?
https://www.youtube.com/watch?v=dJQ5lV5Tldw
And if it can work in the local mode. I think it should be able to work in
cluster mode
Hi guys,
I'm able to run some Spark SQL example but the sql is static in the code. I
would like to know is there a way to read sql from somewhere else (shell
for example)
I could read sql statement from kafka/zookeeper, but I cannot share the sql
to all workers. broadcast seems not working for
in the
code? What do you mean by cannot shar the sql to all workers?
On Tue, Jul 22, 2014 at 4:03 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Hi guys,
I'm able to run some Spark SQL example but the sql is static in the
code. I
would like to know is there a way to read sql from somewhere
))
})
ssc.start()
ssc.awaitTermination()
On Tue, Jul 22, 2014 at 5:10 PM, Zongheng Yang zonghen...@gmail.com wrote:
Can you paste a small code example to illustrate your questions?
On Tue, Jul 22, 2014 at 5:05 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Sorry, typo. What I mean
after the StreamingContext has started.
Tobias
On Wed, Jul 23, 2014 at 9:55 AM, hsy...@gmail.com hsy...@gmail.com
wrote:
For example, this is what I tested and work on local mode, what it does
is it get data and sql query both from kafka and do sql on each RDD and
output the result back
I have the same problem
On Sat, Jul 19, 2014 at 12:31 AM, lihu lihu...@gmail.com wrote:
Hi,
Everyone. I have a piece of following code. When I run it,
it occurred the error just like below, it seem that the SparkContext is not
serializable, but i do not try to use the SparkContext
Thanks Tathagata, so can I say RDD size(from the stream) is window size.
and the overlap between 2 adjacent RDDs are sliding size.
But I still don't understand what it batch size, why do we need this since
data processing is RDD by RDD right?
And does spark chop the data into RDDs at the very
Hi Jay,
I would like to take a look at the code base and maybe start working on
some jiras.
Best,
Siyuan
On Wed, Jul 16, 2014 at 3:09 PM, Jay Kreps jay.kr...@gmail.com wrote:
Hey All,
A number of people have been submitting really nice patches recently.
If you are interested in
Is there a scala API doc for the entire kafka library?
On Wed, Jul 16, 2014 at 5:34 PM, hsy...@gmail.com hsy...@gmail.com wrote:
Hi Jay,
I would like to take a look at the code base and maybe start working on
some jiras.
Best,
Siyuan
On Wed, Jul 16, 2014 at 3:09 PM, Jay Kreps jay.kr
When I'm reading the API of spark streaming, I'm confused by the 3
different durations
StreamingContext(conf: SparkConf
http://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkConf.html
, batchDuration: Duration
reproduce it.
On Mon, Jul 14, 2014 at 7:36 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Before yarn application -kill If you do jps You'll have a list
of SparkSubmit and ApplicationMaster
After you use yarn applicaton -kill you only kill the SparkSubmit
On Mon, Jul 14, 2014 at 4:29 PM
sure it works, and you see output?
Also, I recommend going through the previous step-by-step approach to
narrow down where the problem is.
TD
On Mon, Jul 14, 2014 at 9:15 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Actually, I deployed this on yarn cluster(spark-submit) and I couldn't
find
anything in the driver logs!
So try doing a collect, or take on the RDD returned by sql query and print
that.
TD
On Tue, Jul 15, 2014 at 4:28 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
By the way, have you ever run SQL and stream together? Do you know any
example that works? Thanks!
On Tue
Hi all,
A newbie question, I start a spark yarn application through spark-submit
How do I kill this app. I can kill the yarn app by yarn application -kill
appid but the application master is still running. What's the proper way
to shutdown the entire app?
Best,
Siyuan
Hi All,
Couple days ago, I tried to integrate SQL and streaming together. My
understanding is I can transform RDD from Dstream to schemaRDD and execute
SQL on each RDD. But I got no luck
Would you guys help me take a look at my code? Thank you very much!
object KafkaSpark {
def main(args:
. This is what I did 2 hours
ago.
Sorry I cannot provide more help.
Sent from my iPhone
On 14 Jul, 2014, at 6:05 pm, hsy...@gmail.com hsy...@gmail.com wrote:
yarn-cluster
On Mon, Jul 14, 2014 at 2:44 PM, Jerry Lam chiling...@gmail.com wrote:
Hi Siyuan,
I wonder if you --master yarn-cluster
but SQL command throwing error? No errors but no output
either?
TD
On Mon, Jul 14, 2014 at 4:06 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Hi All,
Couple days ago, I tried to integrate SQL and streaming together. My
understanding is I can transform RDD from Dstream to schemaRDD and execute
that to work, then I would test the Spark SQL
stuff.
TD
On Mon, Jul 14, 2014 at 5:25 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
No errors but no output either... Thanks!
On Mon, Jul 14, 2014 at 4:59 PM, Tathagata Das
tathagata.das1...@gmail.com wrote:
Could you elaborate on what
:21 AM, hsy...@gmail.com hsy...@gmail.com
wrote:
Hi guys,
I'm a new user to spark. I would like to know is there an example of how
to user spark SQL and spark streaming together? My use case is I want to do
some SQL on the input stream from kafka.
Thanks!
Best,
Siyuan
I have a newbie question. What is the difference between SparkSQL and Shark?
Best,
Siyuan
I have the same problem. I didn't dig deeper but I saw this happen when I
launch kafka in daemon mode. I found the daemon mode is just launch kafka
with nohup. Not quite clear why this happen.
On Wed, Jul 9, 2014 at 9:59 AM, Lung, Paul pl...@ebay.com wrote:
Yup. In fact, I just ran the test
Hi guys,
I'm a new user to spark. I would like to know is there an example of how to
user spark SQL and spark streaming together? My use case is I want to do
some SQL on the input stream from kafka.
Thanks!
Best,
Siyuan
-to-topic node on Yarn.
On Jun 17, 2014, at 10:44 AM, hsy...@gmail.com wrote:
Hi Shaikh,
I heard some throughput bottleneck of storm. It cannot really scale up
with
kafka.
I recommend you to try DataTorrent platform(
https://www.datatorrent.com/
)
The platform itself
I'm using 0.8.1.1
I use DeleteTopicCommand to delete topic
args[0] = --topic;
args[1] = the topic you want to delete
args[2] = --zookeeper;
args[3] = kafkaZookeepers;
DeleteTopicCommand.main(args);
You can write your own script to delete the topic, I guess. And I think it
only
Hi Shaikh,
I heard some throughput bottleneck of storm. It cannot really scale up with
kafka.
I recommend you to try DataTorrent platform(https://www.datatorrent.com/)
The platform itself is not open-source but it has a open-source library (
https://github.com/DataTorrent/Malhar) which contains
Hi guys,
So far, is there a way to track the asyn producer callback.
My requirement is basically if all nodes of the topic goes down, can I
pause the producer and after the broker comes back online, continue to
produce from the failure point?
Best,
Siyuan
Hi guys,
I found there is a tool to add partition on the fly. My question is, is
there a way to delete a partition at runtime? Thanks!
Best,
Siyuan
Hi All, I was trying to upgrade the kafka to 0.8 but I get an empty jar
file for
dependency
groupIdorg.apache.kafka/groupId
artifactIdkafka_2.8.0/artifactId
version0.8.0/version
/dependency
However
dependency
groupIdorg.apache.kafka/groupId
artifactIdkafka_2.8.2/artifactId
/
On Wed, Dec 4, 2013 at 4:48 PM, hsy...@gmail.com hsy...@gmail.com wrote:
Hi All, I was trying to upgrade the kafka to 0.8 but I get an empty jar
file for
dependency
groupIdorg.apache.kafka/groupId
artifactIdkafka_2.8.0/artifactId
version0.8.0/version
What I did for my project is I have a thread send metadata request to a
random broker and monitor the metadata change periodically. The good thing
is, to my knowledge, any broker in the cluster know the metadata for all
the topics served in this cluster. Another options is you can always query
I think the max 50Mbps is almost the disk bottleneck
My guess is IO is the bottle neck for kafka if you set to same type(async
without ack)
I got throughput at about 30Mb
Try to increase if you don't care about latency very much
log.flush.interval.messages=1
log.flush.interval.ms=3000
On
Also if you use HEAD, you can create more partitions at runtime, you just
need dynamic partitioner class I think
On Thu, Nov 14, 2013 at 7:23 AM, Neha Narkhede neha.narkh...@gmail.comwrote:
There is no way to delete topics in Kafka yet. You can add partitions to
existing topics, but you may
Hi,
I have questions about the load balancing of kafka high-level consumer
Suppose I have 4 partition
And the producer throughput to these 4 partitions are like this
01 23
10MB/s 10MB/s 1MB/s1MB/s
1kMsg/s,
I didn't see any auto leader election for adding nodes. The data are still
skewed on the old nodes. You have to force it by running script?
On Wed, Nov 13, 2013 at 6:41 AM, Neha Narkhede neha.narkh...@gmail.comwrote:
At those many topics, zookeeper will be the main bottleneck. Leader
election
LLC
http://www.stealth.ly
Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/
On Tue, Nov 12, 2013 at 2:56 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Hi guys,
When I built my project using maven I got WARNING
[WARNING
I'm working on some fault-tolerant consumer group. The idea is this, to
maximize the throughput of kafka. I request the metadata from broker and
create #{num of partition} consumers for each topic and distribute them on
different nodes. Moreover, there is mechanism to detect fail of any node
and
Hi guys,
When I built my project using maven I got WARNING
[WARNING] The POM for org.apache.kafka:kafka_2.8.0:jar:0.8.0-beta1 is
invalid, transitive dependencies (if any) will not be available: 1 problem
was encountered while building the effective model
And I looked at the
Hi guys,
Is there a detail document about the attributes and objectnames about the
mbeans?
For example, what does attribute MeanRate of Object MessagesPerSec
mean? It's the mean value of last 1 sec/1min ?
http://kafka.apache.org/documentation.html#monitoring
only have a little information about
Hi guys, since kafka is able to add new broker into the cluster at runtime,
I'm wondering is there a way to add new partition for a specific topic at
run time? If not what will you do if you want to add more partition to a
topic? Thanks!
-partition tool:
https://cwiki.apache.org/confluence/display/KAFKA/Replication+tools#Replicationtools-5.AddPartitionTool
Guozhang
On Fri, Nov 8, 2013 at 5:32 PM, hsy...@gmail.com hsy...@gmail.com wrote:
Hi guys, since kafka is able to add new broker into the cluster at
runtime,
I'm wondering
I mean I assume the messages not yet consumed before delete-topic will be
delivered before you create same topic, correct?
On Fri, Nov 8, 2013 at 6:30 PM, hsy...@gmail.com hsy...@gmail.com wrote:
It's in the branch, cool, I'll wait for it's release. actually I find I
can use ./kafka-delete
Hi guys, I have some throughput questions.
I try to test the throughput using both the High Level Consumer and Simple
Consumer example from the document. But I get much lower throughput of
simple consumer than the high level consumer.
I run the test in the cluster and I'm sure I distribute the
There is a ticket for auto-rebalancing, hopefully they'll do auto
redistribution soon
https://issues.apache.org/jira/browse/KAFKA-930
On Wed, Oct 16, 2013 at 12:29 AM, Kane Kane kane.ist...@gmail.com wrote:
Yes, thanks, looks like that's what i need, do you know why it tends to
choose the
I found some weird behavior,
I follow the exact code example for HighlevelConsumer
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example#
but add one debug line here
public void run() {
ConsumerIteratorbyte[], byte[] it = m_stream.iterator();
while
Hi kafka,
Is there a programmatic way to create topic.
http://stackoverflow.com/questions/16946778/how-can-we-create-a-topic-in-kafka-from-the-ide-using-api/18480684#18480684
is too hacky, plus it's not a sync function.
I'm asking this because I'm writing a test case which will start kafka
CreateTopicCommand.createTopic(). This is probably something we can improve
in the forthcoming releases.
Thanks,
Neha
On Mon, Oct 14, 2013 at 3:02 PM, hsy...@gmail.com hsy...@gmail.com
wrote:
Hi kafka,
Is there a programmatic way to create topic.
http://stackoverflow.com/questions/16946778/how-can-we
Hi guys,
Here is a case I observed, I have a single-node 3 broker instance cluster.
I created 1 topic with 2 partitions and 2 replica for each partition. The
initial distribution is like this topic1/partition0 -(broker0, broker2)
topic1/partition1 -(broker1,broker2). So broker0 is leader broker
Hi Jun,
Thanks for your reply, but in a real cluster, one broker could serve
different topics and different partitions, the simple consumer only has
knowledge of brokers that are available but it has no knowledge to decide
which broker is best to pick up to consume messages. If you don't choose
of a single
TopicMetadataRequest roundtrip to some kafka broker.
Thanks,
Neha
On Fri, Oct 11, 2013 at 11:30 AM, hsy...@gmail.com hsy...@gmail.com
wrote:
Thanks guys!
But I feel weird. Assume I have 20 brokers for 10 different topics with 2
partitions and 2 replicas for each. For each consumer
-CanIpredictthere
sultsoftheconsumerrebabalance%3F
Guozhang
On Fri, Oct 11, 2013 at 11:06 AM, hsy...@gmail.com hsy...@gmail.com
wrote:
Hi Jun,
Thanks for your reply, but in a real cluster, one broker could serve
different topics and different partitions, the simple consumer only has
Hi guys,
I'm trying to maintain a bunch of simple kafka consumer to consume messages
from brokers. I know there is a way to send TopicMetadataRequest to broker
and get the response from the broker. But you have to specify the broker
list to query the information. But broker might not be available
101 - 163 of 163 matches
Mail list logo