Re: [Spark Launcher] How to launch parallel jobs?

2017-02-13 Thread Egor Pahomov
look on you hadoop UI and verify, that both job get enough resources. 2017-02-13 11:07 GMT-08:00 Egor Pahomov : > "But if i increase only executor-cores the finish time is the same". More > experienced ones can correct me, if I'm wrong, but as far as I understand > that: o

Re: [Spark Launcher] How to launch parallel jobs?

2017-02-13 Thread Egor Pahomov
"But if i increase only executor-cores the finish time is the same". More experienced ones can correct me, if I'm wrong, but as far as I understand that: one partition processed by one spark task. Task is always running on 1 core and not parallelized among cores. So if you have 5 partitions and you

Re: Union of DStream and RDD

2017-02-11 Thread Egor Pahomov
Interestingly, I just faced with the same problem. By any change, do you want to process old files in the directory as well as new ones? It's my motivation and checkpointing my problem as well. 2017-02-08 22:02 GMT-08:00 Amit Sela : > Not with checkpointing. > > On Thu, Feb 9, 20

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-11 Thread Egor Pahomov
myself so I'm only >> guessing having a brief look at the API. >> >> Pozdrawiam, >> Jacek Laskowski >> >> https://medium.com/@jaceklaskowski/ >> Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark >> Follow me at https://twitter

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-10 Thread Egor Pahomov
witter.com/jaceklaskowski > > > On Thu, Feb 9, 2017 at 3:55 AM, Egor Pahomov > wrote: > > Jacek, you mean > > http://spark.apache.org/docs/latest/api/scala/index.html# > org.apache.spark.sql.ForeachWriter > > ? I do not understand how to use it, since it passes e

Re: [Spark-SQL] Hive support is required to select over the following tables

2017-02-08 Thread Egor Pahomov
Just guessing here, but have you build your spark with "-Phive"? By the way, which version of Zeppelin? 2017-02-08 5:13 GMT-08:00 Daniel Haviv : > Hi, > I'm using Spark 2.1.0 on Zeppelin. > > I can successfully create a table but when I try to select from it I fail: > spark.sql("create table foo

Re: Union of DStream and RDD

2017-02-08 Thread Egor Pahomov
Just guessing here, but would http://spark.apache.org/docs/latest/streaming-programming-guide.html#basic-sources "*Queue of RDDs as a Stream*" work? Basically create DStream from your RDD and than union with other DStream. 2017-02-08 12:32 GMT-08:00 Amit Sela : > Hi all, > > I'm looking to union

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-08 Thread Egor Pahomov
Laskowski : > Hi, > > Have you considered foreach sink? > > Jacek > > On 6 Feb 2017 8:39 p.m., "Egor Pahomov" wrote: > >> Hi, I'm thinking of using Structured Streaming instead of old streaming, >> but I need to be able to save results to H

Re: [Structured Streaming] Using File Sink to store to hive table.

2017-02-06 Thread Egor Pahomov
oning information in > its own metadata log. Is there a specific reason that you want to store the > information in the Hive Metastore? > > Best, > Burak > > On Mon, Feb 6, 2017 at 11:39 AM, Egor Pahomov > wrote: > >> Hi, I'm thinking of using Structured Streaming i

[Structured Streaming] Using File Sink to store to hive table.

2017-02-06 Thread Egor Pahomov
Hi, I'm thinking of using Structured Streaming instead of old streaming, but I need to be able to save results to Hive table. Documentation for file sink says( http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#output-sinks): "Supports writes to partitioned tables. ". B

Logs of spark driver in yarn-client mode.

2016-07-06 Thread Egor Pahomov
Hi, I have next issue: I have zeppelin, which set up in yarn-client mode. Notebook in Running state for long period of time with 0% done and I do not see any even accepted application in yarn. To be able to understand what's going on, I need logs of spark driver, which is trying to connect to had

Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
What about yarn-cluster mode? 2016-07-01 11:24 GMT-07:00 Egor Pahomov : > Separate bad users with bad quires from good users with good quires. Spark > do not provide no scope separation out of the box. > > 2016-07-01 11:12 GMT-07:00 Jeff Zhang : > >> I think so, any reas

Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
d suggest you to deploy one spark thrift server per >>> machine for now. If stick to deploy multiple spark thrift server on one >>> machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and >>> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if >

Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
er on one > machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and > SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if > there's other conflicts. but please try first. > > > On Fri, Jul 1, 201

Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
rt conflict, pid file, log file and > etc, you can run multiple instances of spark thrift server. > > On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov > wrote: > >> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really >> bother me - >> >> 1)

Thrift JDBC server - why only one per machine and only yarn-client

2016-07-01 Thread Egor Pahomov
Hi, I'm using Spark Thrift JDBC server and 2 limitations are really bother me - 1) One instance per machine 2) Yarn client only(not yarn cluster) Are there any architectural reasons for such limitations? About yarn-client I might understand in theory - master is the same process as a server, so i

Re: 1.6.0: Standalone application: Getting ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2016-01-14 Thread Egor Pahomov
ar(). It's really good news, since it's hard to do addJar() properly in Oozie job. 2016-01-12 17:01 GMT-08:00 Egor Pahomov : > Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing > serious issue. I successfully updated spark thrift server from 1.5.2 to

1.6.0: Standalone application: Getting ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory

2016-01-12 Thread Egor Pahomov
Hi, I'm moving my infrastructure from 1.5.2 to 1.6.0 and experiencing serious issue. I successfully updated spark thrift server from 1.5.2 to 1.6.0. But I have standalone application, which worked fine with 1.5.2 but failing on 1.6.0 with: *NestedThrowables:* *java.lang.ClassNotFoundException: org

Re: Elastic allocation(spark.dynamicAllocation.enabled) results in task never being executed.

2014-11-14 Thread Egor Pahomov
YARN, which could be because > other jobs are using up all the resources. > > -Sandy > > On Fri, Nov 14, 2014 at 11:32 AM, Egor Pahomov > wrote: > >> Hi. >> I execute ipython notebook + pyspark with spark.dynamicAllocation.enabled >> = true. Task never ends

Elastic allocation(spark.dynamicAllocation.enabled) results in task never being executed.

2014-11-14 Thread Egor Pahomov
Hi. I execute ipython notebook + pyspark with spark.dynamicAllocation.enabled = true. Task never ends. Code: import sys from random import random from operator import add partitions = 10 n = 10 * partitions def f(_): x = random() * 2 - 1 y = random() * 2 - 1 return 1 if x ** 2 + y

Make Spark Job Board permanent.

2014-11-08 Thread Egor Pahomov
During Spark Summit 2014 there was a Job Board( http://spark-summit.org/2014/jobs) for positions related to spark technology. It is great thing, because it's hard to search for position, related to so young technology. And such board good for spark community, because it makes easy for companies to

Re: SPARK 1.1.0 on yarn-cluster and external JARs

2014-09-25 Thread Egor Pahomov
SparkContext.addJar()? Why you didn't like fat jar way? 2014-09-25 16:25 GMT+04:00 rzykov : > We build some SPARK jobs with external jars. I compile jobs by including > them > in one assembly. > But look for an approach to put all external jars into HDFS. > > We have already put spark jar in a

java.io.FileNotFoundException in usercache

2014-09-25 Thread Egor Pahomov
I work with spark on unstable cluster with bad administration. I started get 14/09/25 15:29:56 ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /local/hd2/yarn/local/usercache/epahomov/appcache/application_1411219858924_15501/spark-local-20140925151931

pyspark + yarn: how everything works.

2014-07-04 Thread Egor Pahomov
Hi, I want to use pySpark with yarn. But documentation doesn't give me full understanding on what's going on, and I simply don't understand code. So: 1) How python shipped to cluster? Should machines in cluster already have python? 2) What happens when I write some python code in "map" function -

Re: K-means faster on Mahout then on Spark

2014-03-25 Thread Egor Pahomov
t; Sent from my iPhone > > On Mar 25, 2014, at 9:25 AM, Prashant Sharma wrote: > > I think Mahout uses FuzzyKmeans, which is different algorithm and it is > not iterative. > > Prashant Sharma > > > On Tue, Mar 25, 2014 at 6:50 PM, Egor Pahomov wrote: > >> H

K-means faster on Mahout then on Spark

2014-03-25 Thread Egor Pahomov
Hi, I'm running benchmark, which compares Mahout and SparkML. For now I have next results for k-means: Number of iterations= 10, number of elements = 1000, mahouttime= 602, spark time = 138 Number of iterations= 40, number of elements = 1000, mahouttime= 1917, spark time = 330 Number of ite

[Powered by] Yandex Islands powered by Spark

2014-03-16 Thread Egor Pahomov
Hi, page https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Sparksays I need write here, if want my project to be added there. In Yandex (www.yandex.com) now we using spark for project Yandex Islands ( http://www.searchenginejournal.com/yandex-islands-markup-issues-implementation/71891/)

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-02-28 Thread Egor Pahomov
04:00 Egor Pahomov : > In that same pom > > > yarn > > > 2 > 2.2.0 > 2.5.0 > > > > yarn > > > > > > > 2014-02-28 23:46 GMT+04:00 Aureliano Buendia : > > >

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-02-28 Thread Egor Pahomov
In that same pom yarn 2 2.2.0 2.5.0 yarn 2014-02-28 23:46 GMT+04:00 Aureliano Buendia : > > > > On Fri, Feb 28, 2014 at 7:17 PM, Egor Pahomov wrote: > >> Spark 0.9 uses protobuf 2.5.0 >&g

Re: Error reading HDFS file using spark 0.9.0 / hadoop 2.2.0 - incompatible protobuf 2.5 and 2.4.1

2014-02-28 Thread Egor Pahomov
Spark 0.9 uses protobuf 2.5.0 Hadoop 2.2 uses protobuf 2.5.0 protobuf 2.5.0 can read massages serialized with protobuf 2.4.1 So there is not any reason why you can't read some messages from hadoop 2.2 with protobuf 2.5.0, probably you somehow have 2.4.1 in your class path. Of course it's very bad,