Hi all,
could someone please help me understand the broadcast life cycle in detail,
especially with regard to memory management?
After reading through the TorrentBroadcast implementation, it seems that
for every broadcast object, the driver holds a strong reference to a
shallow copy (in MEMORY_AN
Thanks. This is an important direction to explore and my apologies for the
late reply.
One thing that is really hard about this is that with different layers of
abstractions, we often use other libraries that might allocate large amount
of memory (e.g. snappy library, Parquet itself), which makes
On Wed, Sep 20, 2017 at 3:10 AM, Wenchen Fan wrote:
> Hi all,
>
> I want to have some discussion about Data Source V2 write path before
> starting a voting.
>
> The Data Source V1 write path asks implementations to write a DataFrame
> directly, which is painful:
> 1. Exposing upper-level API like
I'm not exactly clear on what you're proposing, but this sounds like
something that would live as a Spark package - a framework for anomaly
detection built on Spark. If there is some specific algorithm you have in
mind, it would be good to propose it on JIRA and discuss why you think it
needs to be
Hi all,
I want to have some discussion about Data Source V2 write path before
starting a voting.
The Data Source V1 write path asks implementations to write a DataFrame
directly, which is painful:
1. Exposing upper-level API like DataFrame to Data Source API is not good
for maintenance.
2. Data s