Arrow Dataset API on Ceph

2020-08-26 Thread Ivo Jimenez
Dear Arrow community, We are writing to share our thoughts about designing an Apache Arrow-native storage system leveraging Ceph’s extensibility mechanism as part of the SkyhookDM project and aim for a design that leverages Arrow as much as possible, both on the client API a

Re: Arrow Dataset API on Ceph

2020-08-27 Thread Ivo Jimenez
Hi Antoine, > Our main concern is that this new arrow::dataset::RadosFormat class will > be > > deriving from the arrow::dataset::FileFormat class, which seems to raise > a > > conceptual mismatch as there isn’t really a RADOS format but rather a > > formatting/serialization deferral that will be

Re: Arrow Dataset API on Ceph

2020-08-28 Thread Ivo Jimenez
Hi Antoine > Yes, that is our plan. Since this is going to be done on the storage-, > > server-side, this would be transparent to the client. So our main concern > > is whether this be OK from the design perspective, and could this > > eventually be merged upstream? > > Arrow datasets have no noti

Re: Arrow Dataset API on Ceph

2020-09-02 Thread Ivo Jimenez
Hi Ben, > > Our main concern is that this new arrow::dataset::RadosFormat class will > be > > deriving from the arrow::dataset::FileFormat class, which seems to raise > a > > conceptual mismatch as there isn’t really a RADOS format > > IIUC RADOS doesn't interact with a filesystem directly, so Ra

Re: Arrow Dataset API on Ceph

2020-09-15 Thread Ivo Jimenez
hat we can integrate it in CI tests. Would it be OK to include gmock as a dependency? thanks! On 2020/09/02 22:05:51, Ivo Jimenez wrote: > Hi Ben, > > > > > Our main concern is that this new arrow::dataset::RadosFormat class will > > be > > > deriving from

Re: Arrow Dataset API on Ceph

2020-09-16 Thread Ivo Jimenez
15, 2020 at 10:16 AM Antoine Pitrou wrote: > > > > > Hi Ivo, > > > > You can open a JIRA once you've got a PR ready. No need to do it before > > you think you're ready for submission. > > > > AFAIK, gmock is already a dependency. > > &