Hi, Folks,
I have filed a JIRA ticket
(https://issues.apache.org/jira/browse/SPARK-26247) for
an SPIP on improving the model load latency and serving interfaces for MLLib
model
online serving, as discussed with Joseph Bradley and with Felix Cheung as
the SPIP Shepherd.
The associated SPIP doc is l
Thank you Ryan and Xiao – sharing all this info really gives a very good
insight!
From: Ryan Blue
Reply-To: "rb...@netflix.com"
Date: Monday, December 3, 2018 at 12:05 PM
To: "Thakrar, Jayesh"
Cc: Xiao Li , Spark Dev List
Subject: Re: DataSourceV2 community sync #3
Jayesh,
I don’t think th
Jayesh,
I don’t think this need is very narrow.
To have reliable behavior for CTAS, you need to:
1. Check whether a table exists and fail. Right now, it is up to the
source whether to continue with the write if the table already exists or to
throw an exception, which is unreliable acros
Thank you Xiao – I was wondering what was the motivation for the catalog.
If CTAS is the only candidate, would it suffice to have that as part of the
data source interface only?
If we look at BI, ETL and reporting tools which interface with many tables from
different data sources at the same tim
Jayesh,
The current catalog in Spark is a little weird. It uses a Hive catalog and
adds metadata that only Spark understands to track tables, in addition to
regular Hive tables. Some of those tables are actually just pointers to
tables that exist in some other source of truth. This is what makes t
Do you agree on my definition of catalog in Spark SQL?
I think we agree on what a catalog is: A service that can manage the
metadata and definitions of databases, views, tables, functions, roles, etc.
external objects accessed through our data source APIs are called “tables”.
I do not think we wi
Hi, Jayesh,
This is a good question. Spark is a unified analytics engine for various
data sources. We are able to get the table schema from the underlying data
sources via our data source APIs. Thus, it resolves most of the user
requirements. Spark does not need the other info (like database, func