Re: Number of records per micro-batch in DStream vs Structured Streaming

2018-07-08 Thread subramgr
Any one? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: How to avoid duplicate column names after join with multiple conditions

2018-07-08 Thread Vamshi Talla
Nirav, Spark does not create a duplicate column when you use the below join expression, as an array of column(s) like below but that requires the column name to be same in both the data frames. Example: df1.join(df2, [‘a’]) Thanks. Vamshi Talla On Jul 6, 2018, at 4:47 PM, Gokula Krishnan D

Re: repartition

2018-07-08 Thread Vamshi Talla
Hi Ravi, RDDs are always immutable, so you cannot change them, instead you create new ones by transforming one. Repartition is a transformation, so it lazily evaluated, hence computed only when you call an action on it. Thanks. Vamshi Talla On Jul 8, 2018, at 12:26 PM,

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread yohann jardin
When you run on Yarn, you don’t even need to start a spark cluster (spark master and slaves). Yarn receives a job and then allocate resources for the application master and then its workers. Check the resources available in the node section of the resource manager UI (and is your node actually

repartition

2018-07-08 Thread ryandam.9
Hi, Can anyone clarify how repartition works please ? * I have a DataFrame df which has only one partition: // Returns 1 df.rdd.getNumPartitions * I repartitioned it by passing "3" and assigned it a new DataFrame newdf val newdf = df.repartition(3) *

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali
@yohann sorry I am assuming you meant application master if so I believe spark is the one that provides application master. Is there anyway to look for how much resources are being requested and how much yarn is allowed to provide? I would assume this is a common case if so I am not sure why these

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali
yarn.scheduler.capacity.maximum-am-resource-percent by default is set to 0.1 and I tried changing it to 1.0 and still no luck. same problem persists. The master here is yarn and I just trying to spawn spark-shell --master yarn --deploy-mode client and run a simple world count so I am not sure why

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread yohann jardin
Following the logs from the resource manager: 2018-07-08 07:23:23,382 WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: maximum-am-resource-percent is insufficient to start a single application in queue, it is likely set too low. skipping enforcement to allow at

Re: Create an Empty dataframe

2018-07-08 Thread रविशंकर नायर
>From Stackoverflow: from pyspark.sql.types import StructType from pyspark.sql.types import StructField from pyspark.sql.types import StringType sc = SparkContext(conf=SparkConf()) spark = SparkSession(sc) # Need to use SparkSession(sc) to createDataFrame schema = StructType([

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread रविशंकर नायर
Are you able to run a simple Map Reduce job on yarn without any issues? If you have any issues: I had this problem on Mac. Use CSRUTIL in Mac, to disable it. Then add a softlink sudo ln –s /usr/bin/java/bin/java The new versions of Mac from EL Captain does not allow softlinks in

Re: Create an Empty dataframe

2018-07-08 Thread Shmuel Blitz
Hi Dimitris, Could you explain your use case in a bit more details? What you are asking for, if I understand you correctly, is not the advised way to go about. If you're running analytics and expect their output to be a Dataframe with the specified columns, then you should compose your queries

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali
Hi, It's on local mac book pro machine that has 16GB RAM 512GB disk and 8 vCpu! I am not running any code since I can't even spawn spark-shell with yarn as master as described in my previous email. I just want to run simple word count using yarn as master. Thanks! Below is the resource manager

Re: spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread Marco Mistroni
You running on emr? You checked the emr logs? Was in similar situation where job was stuck in accepted and then it died..turned out to be an issue w. My code when running g with huge data.perhaps try to reduce gradually the load til it works and then start from there? Not a huge help but I

spark-shell gets stuck in ACCEPTED state forever when ran in YARN client mode.

2018-07-08 Thread kant kodali
Hi All, I am trying to run a simple word count using YARN as a cluster manager. I am currently using Spark 2.3.1 and Apache hadoop 2.7.3. When I spawn spark-shell like below it gets stuck in ACCEPTED stated forever. ./bin/spark-shell --master yarn --deploy-mode client I set my