Unfortunately, the short answer is no. Please refer the last part of
discussion on the PR https://github.com/apache/spark/pull/24922
Unless we get any native implementation of this, I guess this project is
most widely known implementation for RocksDB backend state store -
Hi Zhang. Thank you for your response
While your answer clarifies my confusion with `CollectLimit` it still does
not clarify what is the recommended way to extract large amounts of data
(but not all the records) from a source and maintain a high level of
parallelism.
For example , at some
No, that's not a thing to apologize for. It's just your call - less context
would bring less reaction and interest.
On Wed, Apr 22, 2020 at 11:50 AM Ruijing Li wrote:
> I apologize, but I cannot share it, even if it is just typical spark
> libraries. I definitely understand that limits
I apologize, but I cannot share it, even if it is just typical spark
libraries. I definitely understand that limits debugging help, but wanted
to understand if anyone has encountered a similar issue.
On Tue, Apr 21, 2020 at 7:12 PM Jungtaek Lim
wrote:
> If there's no third party libraries in
Hi All,
1. is RosckDB backend available in 3.0 preview?
2. if RocksDB can store intermediate results of a stream-stream join can I
run streaming join queries forever? forever I mean until I run out of
disk. or put it another can I run the stream-stream join queries for years
if necessary
If there's no third party libraries in the dump then why not share the
thread dump? (I mean, the output of jstack)
stack trace would be more helpful to find which thing acquired lock and
which other things are waiting for acquiring lock, if we suspect deadlock.
On Wed, Apr 22, 2020 at 2:38 AM
When running a Pyspark application on my local machine I am able to save
and retrieve from the Mongodb server using the Mongodb Spark connector. All
works properly. When submitting the exact same application on my Amazon EMR
cluster I can see that the package for the Spark driver is being properly
You're hitting an existing issue
https://issues.apache.org/jira/browse/SPARK-17604. While there's no active
PR to address it, I've been planning to take a look sooner than later.
Btw, you may also want to take a look at my previous mail - the topic on
the mail thread was regarding file stream
Yes, we did. But for some reason latest does not show them. The count is
always 0.
On Sun, Apr 19, 2020 at 3:42 PM Jungtaek Lim
wrote:
> Did you provide more records to topic "after" you started the query?
> That's the only one I can imagine based on such information.
>
> On Fri, Apr 17, 2020
In thread dump, I do see this
- SparkUI-160- acceptor-id-ServerConnector@id(HTTP/1.1) | RUNNABLE |
Monitor
- SparkUI-161-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked
by Thread(Some(160)) Lock
- SparkUI-159-acceptor-id-ServerConnector@id(HTTP/1.1) | BLOCKED | Blocked
by
After refreshing a couple of times, I notice the lock is being swapped
between these 3. The other 2 will be blocked by whoever gets this lock, in
a cycle of 160 has lock -> 161 -> 159 -> 160
On Tue, Apr 21, 2020 at 10:33 AM Ruijing Li wrote:
> In thread dump, I do see this
> - SparkUI-160-
Strangely enough I found an old issue that is the exact same issue as mine
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-18343
However I’m using spark 2.4.4 so the issue should have been solved by now.
Like the user in the jira issue I am using mesos, but I am reading from
You may want to make sure you include the jar of P4J and your plugins as
part of the following so that both the driver and executors have access.
If HDFS is out then you could
make a common mount point on each of the executor nodes so they have access
to the classes.
- spark-submit --jars
Hi Team,
While Running Spark Below are some finding.
1. FileStreamSourceLog is responsible for maintaining input source file
list.
2. Spark Streaming delete expired log files on the basis of s
*park.sql.streaming.fileSource.log.deletion* and
https://github.com/apache/spark/pull/7334 may explain the question as below:
> This patch preserves this optimization by treating logical Limit operators
> specially when they appear as the terminal operator in a query plan: if a
> Limit is the final operator, then we will plan a special
Hi users,
I'm a bit of newbie to spark infrastructure. And i have a small doubt.
I have a maven project with plugins generated separately in a folder and
normal java command to run is as follows:
`java -Dp4j.pluginsDir=./plugins -jar /path/to/jar`
Now when I run this program in local with
16 matches
Mail list logo