[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-11-17 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-729033449 @HeartSaVioR @maropu Unfortunately, I don't have time to work further on this right now. If one of you two would like to pick up remaining work in a subsequent PR, please

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-11-11 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-725771277 > The problem is feedback cycle, not whether @cchighman is busy or not. We are requiring contributors for multiple months to keep on focus, whereas reviewers don't promise

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-11-11 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-725767515 @maropu Thank you for your feedback. I will finish the merge pieces this evening. If one of you two would like to pick up any remaining effort, please feel free to do

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-11-11 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-725766878 My ti This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-11-03 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-720961219 > Sigh. Now it has conflicts so I can't go with giving +1 and merging. > > @cchighman Could you please fix the conflicts? It would be totally OK if you'd like to let me

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-10-14 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-708213623 > ping @cchighman Thank you for checking in. Working on this further tonight. This is an automated

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-10-06 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-704426096 > @cchighman Gentle ping in case notice wasn't sent for my review comments. I just noticed this. Thank you. I will resolve this evening.

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-23 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-697913023 @gengliangwang @cloud-fan @zsxwing @maropu @HyukjinKwon @HeartSaVioR @Dooyoung-Hwang Gentle ping. I summarized key feedback above from the review on this PR to help

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-18 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-695165544 > BTW, welcome to the Apache Spark community and thank you for your first contribution, @cchighman . @Dooyoung-Hwang Thank you very much! Hopefully after compiling

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-18 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-695164039 > I roughly checked the code and I think you need to brush up more by referring the other code and the style doc. Could you? The proposed idea itself looks good and I think

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-18 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-695161492 > Shall we use `modificationTimeFilter`? the word `date` is not accurate since it can be referred to date without any hour/minute/seconds Semantics for the new options

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-18 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-695161242 > @cchighman > Thanks for understanding. I still think even in SS case we want to provide timestamp to effectively filter out files - the point is whether the timestamp is

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-18 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-695160233 > These timestamps should be inclusive or exclusive? e.g., `(modifiedAfter, modifiedBefore)`, `[modifiedAfter, modifiedBefore]`... Looks like the current code follows

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-18 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-695159898 > Should we specify the timezone here? What should the default timezone be? > @cloud-fan @MaxGekk @gengliangwang's comment here drew quick attention to the fact

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-18 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-695158666 > > I haven't looked into the detail yet, but Hadoop FS and a PathFilter API, does Spark file source support the path filter option? > > Hadoop's PathFilter only gives

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-18 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-695158320 > > The need arises since we only have one filtering strategy where pathGlobFilter is used in PartitioningAwareFileIndex and handling cleanly if we now have more than one

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-18 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-695158061 > `f` here is `FileStatus`. It looks inefficient to get the `FileStatus` again in https://github.com/apache/spark/pull/28841/files#diff-3c816e65d0bbdf16d46c7ea4e0b8cbaeR55

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-18 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-695157152 > @cchighman I like the idea of such filtering. > How about we follow the approach of the option `pathGlobFilter`:

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-18 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-695155611 @gengliangwang @cloud-fan @maropu @Dooyoung-Hwang @HeartSaVioR @HyukjinKwon Following up on this PR for next steps. We've seen a strong amount of constructive feedback

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-15 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-692617087 @HeartSaVioR @maropu @Dooyoung-Hwang @cloud-fan @gengliangwang Following up as last feedback was 15 days ago, anything else remaining for this one?

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-10 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-690007273 @maropu @HeartSaVioR @gengliangwang @cloud-fan @Dooyoung-Hwang Gentle ping. Thank you for your time!

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-09-05 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-687671088 Are we looking good ? This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-29 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-683336845 @maropu @HeartSaVioR @cloud-fan @gengliangwang @dongjoon-hyun Gentle ping to confirm whether all looks good. Thanks!

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-27 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-681769930 @maropu I took some time this evening to add the additional tests and refactoring you requested. @HeartSaVioR Gentle ping for any further consideration as well.

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-25 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-679991891 @maropu All requested improvements have been completed. Thank you for the opportunity to follow and learn from your analytical process.

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-23 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-678908616 I intend to update the PR based on comments, I'll try to swing around to it this evening. This is an

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-12 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-673165268 @HeartSaVioR @maropu @gengliangwang @cloud-fan There was some great feedback provided by maropu that's now completed with all tests passing. At this point, nothing

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-11 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-672566232 @maropu Everything should be updated as requested. I appreciate your thorough review and believe we arrived at a better result in the process.

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-11 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-672520195 @maropu You will be very pleased to hear that I have discovered the maven task for scalafmt! This is an

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-11 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-671990105 > Ah, one more; could you add fine-grained tests for the path filter implementations? Probably, you can create a dedicated test suite like

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-03 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-668096119 > Anyway, thanks for the valable work, @cchighman ! I appreciate all your feedback and will address each comment.

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-08-01 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-667501680 @HeartSaVioR @gengliangwang @cloud-fan Gentle ping to see if everything looks good. I feel like the final design after incorporating feedback worked out really well here.

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-07-30 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-85340 There are quite a bit of commits here. Would it be preferred if I re-opened this PR for a cleaner merge to master?

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-07-30 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-666133472 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-07-29 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-664699010 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-07-27 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-664140348 sparkr passed this time too. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-07-27 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-664140210 Hmm...doubt this is related. Doesn't happen locally either. ``` == FAIL: test_train_prediction

[GitHub] [spark] cchighman commented on pull request #28841: [SPARK-31962][SQL] Provide modifiedAfter and modifiedBefore options when filtering from a batch-based file data source

2020-07-26 Thread GitBox
cchighman commented on pull request #28841: URL: https://github.com/apache/spark/pull/28841#issuecomment-664115981 > Test failure looks related. I'm a little baffled on the failures from SparkR for SparkSQL Arrow optimization failing. Hoping they clear up with the sql tests