AFAICT, `FileScanRDD` invokes`FilePartition::preferredLocations()`
method, which is ordered by the data size, to get the partition
preferred locations. If there are other vectors to sort, I'm wondering
if here[1] can be a place to add. Or inheriting class `FilePartition`
with overridden `preferredL
May I get a sample scenario to understand the requirement?
--
Cheers,
-z
On Sat, 16 May 2020 11:45:03 +0530
rahul c wrote:
> Hi dev,
>
> Currently I have a scenario where I am reading the data from Kafka using
> spark dataframe.
>
> Multiple data sources ingest the data into kafka same topic
Hi,
I'm considering to improve the experience of hitting potential
exceptions stacktrace omitted in long running application[1], which is
a JVM HotSpot optimization as Shixiong(Ryan) commented[2].
There might be 2 options:
1. Adds `-XX:-OmitStackTraceInFastThrow` as a common Executor JVM
option.
>>
> >> Spark has targeted to have a unified API set rather than having separate
> >> Java classes to reduce the maintenance cost,
> >> e.g.) JavaRDD <> RDD vs DataFrame. These JavaXXX are more about the
> >> legacy.
> >>
> >>
discouraged in
> general up to my best knowledge.
> A Java user won't likely know asJava in Scala but a Scala user will likely
> know both asScala and asJava.
>
>
> 2020년 4월 28일 (화) 오전 11:35, ZHANG Wei 님이 작성:
>
> > How about making a small change on option
How about making a small change on option 4:
Keep Scala API returning Scala type instance with providing a
`asJava` method to return a Java type instance.
Scala 2.13 has provided CollectionConverter [1][2][3], in the following
Spark dependences upgrade, which can be supported by nature. For
cu
AFAICT, not must have `pendingPartitions`, `mapOutputTrackerMaster` is
added by a later change, `pendingPartitions` can be cleaned up.
--
Cheers,
-z
On Sun, 26 Apr 2020 11:53:09 +0200
Jacek Laskowski wrote:
> Hi,
>
> I found that ShuffleMapStage has this (apparently superfluous)
> pendingPart