pdeva opened a new issue #8264: permanently deleting old segments should be part of datasource lifecycle URL: https://github.com/apache/incubator-druid/issues/8264 ### Description Currently datasources have lifecycle methods that simply goven whether or not they are downloaded from deep storage. However, not all data needs to be preserved for eternity. There is a real need to delete very old data. For example, say for a datsource A, you want to: 1. load 90 days of data in historical 2. keep 1 year in deep storage 3. completely delete data > 1 year You can partially accomplish 3. by setting lifecyle rules in S3 or GCP, no doubt. But, the druid metadata storage will still retain segment information about those really old segments. Over time, this will make metadata storage grow indefinitely and make operations on that table slower due to its size. This is worse when you have lots of small segments, since they will contribute to more rows in the metadata storage table. Druid should allow adding a lifecycle rule to a datasource that deletes segments from metadata storage that are too old. Currently this is a manual and error-prone operation. But given the importance of getting this right (you are permanently deleting data), this should be as easy as setting load/drop rules for a datasource.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org