Hi Eric, Currently HDFS store writes data in sequence file format and HFile format. Each value is a serialized event which contain metadata and the value provided by the user. The value can be deserialized using geode classes. Each file can be deserialized independently and does not depend on a live Geode cluster. A user level api to construct this data will be added soon (see GFInputFormat as an example).
HDFS can be used as archive by means of Write-only regions. These regions do not follow LSM-tree structure. LSM structure is used for Read-Write regions. I am planning to create a jira and provide more details. Meanwhile, can you help us understand your use case. In your opinion, what could this interface look like? What about old versions of a key? Do you care for accessing hdfs files directly or is Hdfs Region interface better? Any other information that could be relevant to the hdfs region data access pattern. Thanks Ashvin On Mon, Jul 20, 2015 at 12:57 PM, Eric Pederson <[email protected]> wrote: > In the spec for HDFS integration it says that data events are archived on > HDFS for offline analysis. How do you do offline analysis? Is there an > API for the file format so third party tools can read it? Or do you go > through an HDFS region? > > Also, just curious, are you using a LSM-tree to structure the data? > > Thanks, > > -- Eric >
