[ceph-users] Re: Archive in Ceph similar to Hadoop Archive Utility (HAR)

2022-02-25 Thread Anthony D'Atri
You bet, glad to help. Zillions of small files indeed present a relatively higher metadata overhead, and can be problematic in multiple ways. When using RGW, indexless buckets may be advantageous. Another phenomenon is space amplification — with say a 1 GB file/object, a partially full

[ceph-users] Re: Archive in Ceph similar to Hadoop Archive Utility (HAR)

2022-02-25 Thread Bobby
thanks Anthony and Janneexactly what I have been looking for! On Fri, Feb 25, 2022 at 9:25 AM Janne Johansson wrote: > Den fre 25 feb. 2022 kl 08:49 skrev Anthony D'Atri < > anthony.da...@gmail.com>: > > There was a similar discussion last year around Software Heritage’s > archive project,

[ceph-users] Re: Archive in Ceph similar to Hadoop Archive Utility (HAR)

2022-02-25 Thread Janne Johansson
Den fre 25 feb. 2022 kl 08:49 skrev Anthony D'Atri : > There was a similar discussion last year around Software Heritage’s archive > project, suggest digging up that thread. > Some ideas: > > * Pack them into (optionally compressed) tarballs - from a quick search it > sorta looks like HAR uses a

[ceph-users] Re: Archive in Ceph similar to Hadoop Archive Utility (HAR)

2022-02-24 Thread Anthony D'Atri
There was a similar discussion last year around Software Heritage’s archive project, suggest digging up that thread. Some ideas: * Pack them into (optionally compressed) tarballs - from a quick search it sorta looks like HAR uses a similar model. Store the tarballs as RGW objects, or as RBD