Thanks everyone for the input. Yes it makes sense that metadata
backup/restore should be done outside Spark. We will update the customers
with documentations about how that can be done and leave the
implementations to them.
Thanks,
Tianchen
On Tue, May 11, 2021 at 1:14 AM Mich Talebzadeh
wrote:
>From my experience of dealing with metadata for other applications like
Hive if needed an external database for Spark metadata would be useful.
However, the maintenance and upgrade of that database should be external to
Spark (left to the user) and as usual some form of reliable API or JDBC
conn
That's my expectation as well. Spark needs a reliable catalog.
backup/restore is just implementation details about how you make your
catalog reliable, which should be transparent to Spark.
On Sat, May 8, 2021 at 6:54 AM ayan guha wrote:
> Just a consideration:
>
> Is there a value in backup/rest
Just a consideration:
Is there a value in backup/restore metadata within spark? I would strongly
argue if the metadata is valuable enough and persistent enough, why dont
just use external metastore? It is fairly straightforward process. Also
regardless you are in cloud or not, database bkp is a ro
For now we are thinking about adding two methods in Catalog API, not SQL
commands:
1. spark.catalog.backup, which backs up the current catalog.
2. spark.catalog.restore(file), which reads the DFS file and recreates the
entities described in that file.
Can you please give an example of exposing cli
If a catalog implements backup/restore, it can easily expose some client
APIs to the end-users (e.g. REST API), I don't see a strong reason to
expose the APIs to Spark. Do you plan to add new SQL commands in Spark to
backup/restore a catalog?
On Tue, May 4, 2021 at 2:39 AM Tianchen Zhang
wrote:
Hi all,
Currently the user-facing Catalog API doesn't support backup/restore
metadata. Our customers are asking for such functionalities. Here is a
usage example:
1. Read all metadata of one Spark cluster
2. Save them into a Parquet file on DFS
3. Read the Parquet file and restore all metadata in
Hi,
Currently the user-facing Catalog API doesn't support backup/restore
metadata. Our customers are asking for such functionalities. Here is a
usage example:
1. Read all metadata of one Spark cluster
2. Save them into a Parquet file on DFS
3. Read the Parquet file and restore all metadata in anot