[ https://issues.apache.org/jira/browse/ARROW-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Philipp Moritz reassigned ARROW-4294: ------------------------------------- Assignee: Anurag Khandelwal > [Plasma] Add support for evicting objects to external store > ----------------------------------------------------------- > > Key: ARROW-4294 > URL: https://issues.apache.org/jira/browse/ARROW-4294 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, C++ - Plasma > Affects Versions: 0.11.1 > Reporter: Anurag Khandelwal > Assignee: Anurag Khandelwal > Priority: Minor > Labels: features, pull-request-available > Fix For: 0.13.0 > > Time Spent: 8h 20m > Remaining Estimate: 0h > > Currently, when Plasma needs storage space for additional objects, it evicts > objects by deleting them from the Plasma store. This is a problem when it > isn't possible to reconstruct the object or reconstructing it is expensive. > Adding support for a pluggable external store that Plasma can evict objects > to will address this issue. > My proposal is described below. > *Requirements* > * Objects in Plasma should be evicted to a external store rather than being > removed altogether > * Communication to the external storage service should be through a very > thin, shim interface. At the same time, the interface should be general > enough to support arbitrary remote services (e.g., S3, DynamoDB, Redis, etc.) > * Should be pluggable (e.g., it should be simple to add in or remove the > external storage service for eviction, switch between different remote > services, etc.) and easy to implement > *Assumptions/Non-Requirements* > * The external store has practically infinite storage > * The external store's write operation is idempotent and atomic; this is > needed ensure there are no race conditions due to multiple concurrent > evictions of the same object. > *Proposed Implementation* > * Define a ExternalStore interface with a Connect call. The call returns an > ExternalStoreHandle, that exposes Put and Get calls. Any external store that > needs to be supported has to have this interface implemented. > * In order to read or write data to the external store in a thread-safe > manner, one ExternalStoreHandle should be created per-thread. While the > ExternalStoreHandle itself is not required to be thread-safe, multiple > ExternalStoreHandles across multiple threads should be able to modify the > external store in a thread-safe manner. These handles are most likely going > to be wrappers around the external store client interfaces. > * Replace the DeleteObjects method in the Plasma Store with an EvictObjects > method. If an external store is specified for the Plasma store, the > EvictObjects method would mark the object state as PLASMA_EVICTED, write the > object data to the external store (via the ExternalStoreHandle) and reclaim > the memory associated with the object data/metadata rather than remove the > entry from the Object Table altogether. In case there is no valid external > store, the eviction path would remain the same (i.e., the object entry is > still deleted from the Object Table). > * The Get method in Plasma Store now tries to fetch the object from external > store if it is not found locally and there is an external store associated > with the Plasma Store. The method tries to offload this to an external worker > thread pool with a fire-and-forget model, but may need to do this > synchronously if there are too many requests already enqueued. > * The CMake build system can expose a variable, EXTERNAL_STORE_SOURCES, > which can be appended to with implementations of the ExternalStore and > ExternalStoreHandle interfaces, which will then be compiled into the > plasma_store_server executable. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)