date:20200421

Re: is RosckDB backend available in 3.0 preview?

2020-04-21 Thread Jungtaek Lim

Unfortunately, the short answer is no. Please refer the last part of discussion on the PR https://github.com/apache/spark/pull/24922 Unless we get any native implementation of this, I guess this project is most widely known implementation for RocksDB backend state store -

Re: What is the best way to take the top N entries from a hive table/data source?

2020-04-21 Thread Yeikel

Hi Zhang. Thank you for your response While your answer clarifies my confusion with `CollectLimit` it still does not clarify what is the recommended way to extract large amounts of data (but not all the records) from a source and maintain a high level of parallelism. For example , at some

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

2020-04-21 Thread Jungtaek Lim

No, that's not a thing to apologize for. It's just your call - less context would bring less reaction and interest. On Wed, Apr 22, 2020 at 11:50 AM Ruijing Li wrote: > I apologize, but I cannot share it, even if it is just typical spark > libraries. I definitely understand that limits

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

2020-04-21 Thread Ruijing Li

I apologize, but I cannot share it, even if it is just typical spark libraries. I definitely understand that limits debugging help, but wanted to understand if anyone has encountered a similar issue. On Tue, Apr 21, 2020 at 7:12 PM Jungtaek Lim wrote: > If there's no third party libraries in

is RosckDB backend available in 3.0 preview?

2020-04-21 Thread kant kodali

Hi All, 1. is RosckDB backend available in 3.0 preview? 2. if RocksDB can store intermediate results of a stream-stream join can I run streaming join queries forever? forever I mean until I run out of disk. or put it another can I run the stream-stream join queries for years if necessary

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

2020-04-21 Thread Jungtaek Lim

If there's no third party libraries in the dump then why not share the thread dump? (I mean, the output of jstack) stack trace would be more helpful to find which thing acquired lock and which other things are waiting for acquiring lock, if we suspect deadlock. On Wed, Apr 22, 2020 at 2:38 AM

Spark Mongodb connector hangs indefinitely, not working on Amazon EMR

2020-04-21 Thread Daniel Stojanov

When running a Pyspark application on my local machine I am able to save and retrieve from the Mongodb server using the Mongodb Spark connector. All works properly. When submitting the exact same application on my Amazon EMR cluster I can see that the package for the Spark driver is being properly

Re: Spark Structure Streaming | FileStreamSourceLog not deleting list of input files | Spark -2.4.0

2020-04-21 Thread Jungtaek Lim

You're hitting an existing issue https://issues.apache.org/jira/browse/SPARK-17604. While there's no active PR to address it, I've been planning to take a look sooner than later. Btw, you may also want to take a look at my previous mail - the topic on the mail thread was regarding file stream

Re: Using startingOffsets latest - no data from structured streaming kafka query

2020-04-21 Thread Ruijing Li

Yes, we did. But for some reason latest does not show them. The count is always 0. On Sun, Apr 19, 2020 at 3:42 PM Jungtaek Lim wrote: > Did you provide more records to topic "after" you started the query? > That's the only one I can imagine based on such information. > > On Fri, Apr 17, 2020

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

2020-04-21 Thread Ruijing Li

In thread dump, I do see this - SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE | Monitor - SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by Thread(Some(160)) Lock - SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked by

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

2020-04-21 Thread Ruijing Li

After refreshing a couple of times, I notice the lock is being swapped between these 3. The other 2 will be blocked by whoever gets this lock, in a cycle of 160 has lock -> 161 -> 159 -> 160 On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li wrote: > In thread dump, I do see this > - SparkUI-160-

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

2020-04-21 Thread Ruijing Li

Strangely enough I found an old issue that is the exact same issue as mine https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-18343 However I’m using spark 2.4.4 so the issue should have been solved by now. Like the user in the jira issue I am using mesos, but I am reading from

Re: Using P4J Plugins with Spark

2020-04-21 Thread Todd Nist

You may want to make sure you include the jar of P4J and your plugins as part of the following so that both the driver and executors have access. If HDFS is out then you could make a common mount point on each of the executor nodes so they have access to the classes. - spark-submit --jars

Spark Structure Streaming | FileStreamSourceLog not deleting list of input files | Spark -2.4.0

2020-04-21 Thread Pappu Yadav

Hi Team, While Running Spark Below are some finding. 1. FileStreamSourceLog is responsible for maintaining input source file list. 2. Spark Streaming delete expired log files on the basis of s *park.sql.streaming.fileSource.log.deletion* and

Re: What is the best way to take the top N entries from a hive table/data source?

2020-04-21 Thread ZHANG Wei

https://github.com/apache/spark/pull/7334 may explain the question as below: > This patch preserves this optimization by treating logical Limit operators > specially when they appear as the terminal operator in a query plan: if a > Limit is the final operator, then we will plan a special

Using P4J Plugins with Spark

2020-04-21 Thread Shashanka Balakuntala

Hi users, I'm a bit of newbie to spark infrastructure. And i have a small doubt. I have a maven project with plugins generated separately in a folder and normal java command to run is as follows: `java -Dp4j.pluginsDir=./plugins -jar /path/to/jar` Now when I run this program in local with

Re: is RosckDB backend available in 3.0 preview?

Re: What is the best way to take the top N entries from a hive table/data source?

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

is RosckDB backend available in 3.0 preview?

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Spark Mongodb connector hangs indefinitely, not working on Amazon EMR

Re: Spark Structure Streaming | FileStreamSourceLog not deleting list of input files | Spark -2.4.0

Re: Using startingOffsets latest - no data from structured streaming kafka query

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

Re: Using P4J Plugins with Spark

Spark Structure Streaming | FileStreamSourceLog not deleting list of input files | Spark -2.4.0

Re: What is the best way to take the top N entries from a hive table/data source?

Using P4J Plugins with Spark

16 matches

Site Navigation

Mail list logo

Footer information