(+user, -dev, as this is more appropriate for the users list)

The Kudu master currently keeps a record of all tables and partitions,
including those that have been deleted. With a high enough rate of
table deletion it's theoretically possible for that to consume a lot
of disk space or memory. In practice (and since you mentioned you'd do
it once an hour) I wouldn't expect it to be a problem.

There shouldn't be any long-lasting impact on the tablet servers
though; tablets belonging to deleted tables are completely expunged
from disk.

Alternatively, you may find it more intuitive to model the "create
new, wait, then drop old" data motion via range partitions in a single
table.

On Wed, Aug 14, 2019 at 9:42 AM Scott Reynolds <sdrreyno...@gmail.com> wrote:
>
> Hi developers,
>
> I have a dimension table that is generated by a spark job and written to
> kudu. I would like to remove the rows in the table that were not found by
> the spark job.
>
> To do this, I was thinking the f renaming the existing table so it keeps
> the UUID for existing queries create the table again and load the rows into
> it. An hour later come back through and delete the old table.
>
> If I were to do that what would your three highest concerns be? How would
> this affect kudu master process?

Reply via email to