RE: [EXTERNAL] Announcing Hyperspace v0.3.0 - an indexing subsystem for Apache Spark™

Both Terry and I will be at the upcoming Hyperspace talk at Spark+AI Europe 
Summit 2020<https://databricks.com/dataaisummit/europe-2020/agenda> (in less 
than 7 hrs @ 3:35 AM PST!). Please say hi if you happen to drop by and/or ask 
us anything! 😊

Thank you!
Rahul Potharaju
From: Terry Kim <yumin...@gmail.com>
Sent: Tuesday, November 17, 2020 4:46 PM
To: User <user@spark.apache.org>
Subject: [EXTERNAL] Announcing Hyperspace v0.3.0 - an indexing subsystem for 
Apache Spark™

Hi,

We are happy to announce that Hyperspace v0.3.0 - an indexing subsystem for 
Apache Spark™ - has been just 
released<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fhyperspace%2Freleases%2Ftag%2Fv0.3.0&data=04%7C01%7Crapoth%40microsoft.com%7C60d6ed64ebea493ecf6408d88b5b61a1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637412571883943684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=yItINucUSOkSbe%2BGym9BtKN2W1RTEU%2FkaJsyFaEsrrg%3D&reserved=0>!

Here are the some of the highlights:

  *   Mutable dataset support: Hyperspace v0.3.0 supports mutable dataset where 
users can append or delete the source data.

     *   Hybrid scan: Prior to v0.3.0, any changes in the original dataset 
content required a full refresh to make the index usable again, which could be 
a costly operation. With the Hybrid scan, the existing index can be utilized 
along with newly appended and/or deleted source files, without explicit refresh 
operation. Please check out the Hybrid Scan 
doc<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmicrosoft.github.io%2Fhyperspace%2Fdocs%2Fug-mutable-dataset%2F%23hybrid-scan&data=04%7C01%7Crapoth%40microsoft.com%7C60d6ed64ebea493ecf6408d88b5b61a1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637412571883943684%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=O9M0qiDQKzn2CZLFYbQZ%2BEPPxH2dngkekPKe%2FIYCt1o%3D&reserved=0>
 for more detail.
     *   Incremental refresh: v0.3.0 introduces a "incremental" mode to refresh 
indexes. In this mode, index files are created only for the newly appended 
source files; deleted source files are also handled by removing them from the 
existing index files. Please check out the Incremental Refresh 
doc<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmicrosoft.github.io%2Fhyperspace%2Fdocs%2Fug-mutable-dataset%2F%23refresh-index---incremental-mode&data=04%7C01%7Crapoth%40microsoft.com%7C60d6ed64ebea493ecf6408d88b5b61a1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637412571883953681%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=vnkAHrrgfCA2XDSXTqMJ36ldKUzkkwHfP%2FgTBnowlh8%3D&reserved=0>
 for more detail.

  *   Optimize index: The number of files for indexes can increase due to the 
incremental refreshes, possibly degrading the performance. The new 
"optimizeIndex" API optimizes the existing indexes by merging index files to 
create an optimal number of files. Please check out the Optimize Index 
doc<https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmicrosoft.github.io%2Fhyperspace%2Fdocs%2Fug-optimize-index%2F&data=04%7C01%7Crapoth%40microsoft.com%7C60d6ed64ebea493ecf6408d88b5b61a1%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637412571883963678%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lB3vnegV%2FG8HZnKVRVPo5Q3B8bCGMyCdmT7T2Za4Log%3D&reserved=0>
 for more detail.
We would like to thank the community for the great feedback and all those who 
contributed to this release.

Thanks,
Terry Kim on behalf of the Hyperspace team

RE: [EXTERNAL] Announcing Hyperspace v0.3.0 - an indexing subsystem for Apache Spark™

Reply via email to