look on you hadoop UI and verify, that both job get enough resources.
2017-02-13 11:07 GMT-08:00 Egor Pahomov :
> "But if i increase only executor-cores the finish time is the same". More
> experienced ones can correct me, if I'm wrong, but as far as I understand
> that: o
"But if i increase only executor-cores the finish time is the same". More
experienced ones can correct me, if I'm wrong, but as far as I understand
that: one partition processed by one spark task. Task is always running on
1 core and not parallelized among cores. So if you have 5 partitions and
you
Interestingly, I just faced with the same problem. By any change, do you
want to process old files in the directory as well as new ones? It's my
motivation and checkpointing my problem as well.
2017-02-08 22:02 GMT-08:00 Amit Sela :
> Not with checkpointing.
>
> On Thu, Feb 9, 20
myself so I'm only
>> guessing having a brief look at the API.
>>
>> Pozdrawiam,
>> Jacek Laskowski
>>
>> https://medium.com/@jaceklaskowski/
>> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
>> Follow me at https://twitter
witter.com/jaceklaskowski
>
>
> On Thu, Feb 9, 2017 at 3:55 AM, Egor Pahomov
> wrote:
> > Jacek, you mean
> > http://spark.apache.org/docs/latest/api/scala/index.html#
> org.apache.spark.sql.ForeachWriter
> > ? I do not understand how to use it, since it passes e
Just guessing here, but have you build your spark with "-Phive"? By the
way, which version of Zeppelin?
2017-02-08 5:13 GMT-08:00 Daniel Haviv :
> Hi,
> I'm using Spark 2.1.0 on Zeppelin.
>
> I can successfully create a table but when I try to select from it I fail:
> spark.sql("create table foo
Just guessing here, but would
http://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources
"*Queue of RDDs as a Stream*" work? Basically create DStream from your RDD
and than union with other DStream.
2017-02-08 12:32 GMT-08:00 Amit Sela :
> Hi all,
>
> I'm looking to union
Laskowski :
> Hi,
>
> Have you considered foreach sink?
>
> Jacek
>
> On 6 Feb 2017 8:39 p.m., "Egor Pahomov" wrote:
>
>> Hi, I'm thinking of using Structured Streaming instead of old streaming,
>> but I need to be able to save results to H
oning information in
> its own metadata log. Is there a specific reason that you want to store the
> information in the Hive Metastore?
>
> Best,
> Burak
>
> On Mon, Feb 6, 2017 at 11:39 AM, Egor Pahomov
> wrote:
>
>> Hi, I'm thinking of using Structured Streaming i
Hi, I'm thinking of using Structured Streaming instead of old streaming,
but I need to be able to save results to Hive table. Documentation for file
sink says(
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks):
"Supports writes to partitioned tables. ". B
Hi, I have next issue:
I have zeppelin, which set up in yarn-client mode. Notebook in Running
state for long period of time with 0% done and I do not see any even
accepted application in yarn.
To be able to understand what's going on, I need logs of spark driver,
which is trying to connect to had
What about yarn-cluster mode?
2016-07-01 11:24 GMT-07:00 Egor Pahomov :
> Separate bad users with bad quires from good users with good quires. Spark
> do not provide no scope separation out of the box.
>
> 2016-07-01 11:12 GMT-07:00 Jeff Zhang :
>
>> I think so, any reas
d suggest you to deploy one spark thrift server per
>>> machine for now. If stick to deploy multiple spark thrift server on one
>>> machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
>>> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
>
er on one
> machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
> there's other conflicts. but please try first.
>
>
> On Fri, Jul 1, 201
rt conflict, pid file, log file and
> etc, you can run multiple instances of spark thrift server.
>
> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov
> wrote:
>
>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
>> bother me -
>>
>> 1)
Hi, I'm using Spark Thrift JDBC server and 2 limitations are really bother
me -
1) One instance per machine
2) Yarn client only(not yarn cluster)
Are there any architectural reasons for such limitations? About yarn-client
I might understand in theory - master is the same process as a server, so
i
ar(). It's really good news, since it's hard to do
addJar() properly in Oozie job.
2016-01-12 17:01 GMT-08:00 Egor Pahomov :
> Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing
> serious issue. I successfully updated spark thrift server from 1.5.2 to
Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing
serious issue. I successfully updated spark thrift server from 1.5.2 to
1.6.0. But I have standalone application, which worked fine with 1.5.2 but
failing on 1.6.0 with:
*NestedThrowables:*
*java.lang.ClassNotFoundException:
org
YARN, which could be because
> other jobs are using up all the resources.
>
> -Sandy
>
> On Fri, Nov 14, 2014 at 11:32 AM, Egor Pahomov
> wrote:
>
>> Hi.
>> I execute ipython notebook + pyspark with spark.dynamicAllocation.enabled
>> = true. Task never ends
Hi.
I execute ipython notebook + pyspark with spark.dynamicAllocation.enabled =
true. Task never ends.
Code:
import sys
from random import random
from operator import add
partitions = 10
n = 10 * partitions
def f(_):
x = random() * 2 - 1
y = random() * 2 - 1
return 1 if x ** 2 + y
During Spark Summit 2014 there was a Job Board(
http://spark-summit.org/2014/jobs) for positions related to spark
technology. It is great thing, because it's hard to search for position,
related to so young technology. And such board good for spark community,
because it makes easy for companies to
SparkContext.addJar()?
Why you didn't like fat jar way?
2014-09-25 16:25 GMT+04:00 rzykov :
> We build some SPARK jobs with external jars. I compile jobs by including
> them
> in one assembly.
> But look for an approach to put all external jars into HDFS.
>
> We have already put spark jar in a
I work with spark on unstable cluster with bad administration.
I started get
14/09/25 15:29:56 ERROR storage.DiskBlockObjectWriter: Uncaught
exception while reverting partial writes to file
/local/hd2/yarn/local/usercache/epahomov/appcache/application_1411219858924_15501/spark-local-20140925151931
Hi, I want to use pySpark with yarn. But documentation doesn't give me full
understanding on what's going on, and I simply don't understand code. So:
1) How python shipped to cluster? Should machines in cluster already have
python?
2) What happens when I write some python code in "map" function -
t; Sent from my iPhone
>
> On Mar 25, 2014, at 9:25 AM, Prashant Sharma wrote:
>
> I think Mahout uses FuzzyKmeans, which is different algorithm and it is
> not iterative.
>
> Prashant Sharma
>
>
> On Tue, Mar 25, 2014 at 6:50 PM, Egor Pahomov wrote:
>
>> H
Hi, I'm running benchmark, which compares Mahout and SparkML. For now I
have next results for k-means:
Number of iterations= 10, number of elements = 1000, mahouttime= 602,
spark time = 138
Number of iterations= 40, number of elements = 1000, mahouttime= 1917,
spark time = 330
Number of ite
Hi, page https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Sparksays
I need write here, if want my project to be added there.
In Yandex (www.yandex.com) now we using spark for project Yandex Islands (
http://www.searchenginejournal.com/yandex-islands-markup-issues-implementation/71891/)
04:00 Egor Pahomov :
> In that same pom
>
>
> yarn
>
>
> 2
> 2.2.0
> 2.5.0
>
>
>
> yarn
>
>
>
>
>
>
> 2014-02-28 23:46 GMT+04:00 Aureliano Buendia :
>
>
>
In that same pom
yarn
2
2.2.0
2.5.0
yarn
2014-02-28 23:46 GMT+04:00 Aureliano Buendia :
>
>
>
> On Fri, Feb 28, 2014 at 7:17 PM, Egor Pahomov wrote:
>
>> Spark 0.9 uses protobuf 2.5.0
>&g
Spark 0.9 uses protobuf 2.5.0
Hadoop 2.2 uses protobuf 2.5.0
protobuf 2.5.0 can read massages serialized with protobuf 2.4.1
So there is not any reason why you can't read some messages from hadoop 2.2
with protobuf 2.5.0, probably you somehow have 2.4.1 in your class path. Of
course it's very bad,
30 matches
Mail list logo