Re: spark kudu issues

William Berkeley Wed, 20 Jun 2018 09:49:49 -0700

What you're seeing is the expected behavior in both cases.

One way to achieve the semantics you want in both situations is to read in
the Kudu table to a data frame, then filter it in Spark SQL to contain just
the rows you want to delete, and then use that dataframe to do the
deletion. There should be no primary key errors with that method unless the
table is concurrently being deleted from.

For 2), what I describe above is what Impala does- it reads in the Kudu
table, finds the full primary keys of rows matching your partial
specification of the key, and then issues deletes for those rows.

Note that deletes of multiple rows aren't transactional.

I think having a way to issue deletes that ignores primary key errors is
reasonable, a "delete ignore" analog to insert ignore. I filed KUDU-2482
<http://kudu-2482/> for it.

-Will

On Wed, Jun 20, 2018 at 8:31 AM, Pietro Gentile <
pietro.gentile89.develo...@gmail.com> wrote:

> Hi all,
>
> I am currently evaluating using Spark with Kudu.
> So I am facing the following issues:
>
> 1) If you try to DELETE a row with a key that is not present on the table
> you will have an Exception like this:
>
> java.lang.RuntimeException: failed to write N rows from DataFrame to Kudu;
> sample errors: Not found: key not found (error 0)
>
> 2) If you try to DELETE a row using a subset of a table key you will face
> the following:
>
> Caused by: java.lang.RuntimeException: failed to write N rows from
> DataFrame to Kudu; sample errors: Invalid argument: No value provided for
> key column:
>
> The use cases presented above are correctly working if you interact with
> kudu using Impala.
>
> Any suggestions to overcome these limitation?
>
> Thanks.
> Best Regards
>
> Pietro
>

Re: spark kudu issues

Reply via email to