What you're seeing is the expected behavior in both cases. One way to achieve the semantics you want in both situations is to read in the Kudu table to a data frame, then filter it in Spark SQL to contain just the rows you want to delete, and then use that dataframe to do the deletion. There should be no primary key errors with that method unless the table is concurrently being deleted from.
For 2), what I describe above is what Impala does- it reads in the Kudu table, finds the full primary keys of rows matching your partial specification of the key, and then issues deletes for those rows. Note that deletes of multiple rows aren't transactional. I think having a way to issue deletes that ignores primary key errors is reasonable, a "delete ignore" analog to insert ignore. I filed KUDU-2482 <http://kudu-2482/> for it. -Will On Wed, Jun 20, 2018 at 8:31 AM, Pietro Gentile < pietro.gentile89.develo...@gmail.com> wrote: > Hi all, > > I am currently evaluating using Spark with Kudu. > So I am facing the following issues: > > 1) If you try to DELETE a row with a key that is not present on the table > you will have an Exception like this: > > java.lang.RuntimeException: failed to write N rows from DataFrame to Kudu; > sample errors: Not found: key not found (error 0) > > 2) If you try to DELETE a row using a subset of a table key you will face > the following: > > Caused by: java.lang.RuntimeException: failed to write N rows from > DataFrame to Kudu; sample errors: Invalid argument: No value provided for > key column: > > The use cases presented above are correctly working if you interact with > kudu using Impala. > > Any suggestions to overcome these limitation? > > Thanks. > Best Regards > > Pietro >