Re: Regarding PARQUET-1155

2017-12-05 Thread Atri Sharma
Agreed. I have come up with a patch to add metadata to the page header marking the tuples deleted. The visibility checks will need to consult page header before returning the read results back. The pruning still needs to be implemented. On Tue, Dec 5, 2017 at 3:20 AM, Eric Owhadi wrote: > May b

Re: Regarding PARQUET-1155

2017-12-05 Thread lukas nalezenec
Hi, I think that delete marker is good idea. I was in basic GDPR training and i think that it meets EU law requirements Lukas 2017-12-05 11:37 GMT+01:00 Atri Sharma : > Agreed. > > I have come up with a patch to add metadata to the page header marking > the tuples deleted. The visibility checks

Re: Regarding PARQUET-1155

2017-12-05 Thread Atri Sharma
Thanks. I think a configurable purger which can replace pages (like HBase Compaction, as mentioned above) should suffice and the frequency of compaction can be defined. Do we do the full page replacement technique for replacing records today in any scenario? Regards, Atri On Tue, Dec 5, 2017 a

RE: Regarding PARQUET-1155

2017-12-05 Thread Eric Owhadi
One thing to account for is the row count stats spread all over the various level of stats. If a record is logical deleted, then rowcount = rowcount -1. So when using any level of stats to compute row count, how do we account for logical deletes? Eric -Original Message- From: Atri Sharma

[jira] [Created] (PARQUET-1169) Segment fault when using NextBatch of parquet::arrow::ColumnReader in parquet-cpp

2017-12-05 Thread Fang Jian (JIRA)
Fang Jian created PARQUET-1169: -- Summary: Segment fault when using NextBatch of parquet::arrow::ColumnReader in parquet-cpp Key: PARQUET-1169 URL: https://issues.apache.org/jira/browse/PARQUET-1169 Proje

[jira] [Commented] (PARQUET-1169) Segment fault when using NextBatch of parquet::arrow::ColumnReader in parquet-cpp

2017-12-05 Thread Wes McKinney (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279578#comment-16279578 ] Wes McKinney commented on PARQUET-1169: --- You aren't checking the status codes for

[jira] [Updated] (PARQUET-1169) Segment fault when using NextBatch of parquet::arrow::ColumnReader in parquet-cpp

2017-12-05 Thread Fang Jian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang Jian updated PARQUET-1169: --- Description: When I running the below code, I consistently get segment fault, not sure whether this

[jira] [Commented] (PARQUET-1169) Segment fault when using NextBatch of parquet::arrow::ColumnReader in parquet-cpp

2017-12-05 Thread Fang Jian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16279596#comment-16279596 ] Fang Jian commented on PARQUET-1169: [~wesmckinn] I updated the above code with ABOR

[jira] [Updated] (PARQUET-1169) Segment fault when using NextBatch of parquet::arrow::ColumnReader in parquet-cpp

2017-12-05 Thread Fang Jian (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang Jian updated PARQUET-1169: --- Description: When I running the below code, I consistently get segment fault, not sure whether this