Re: Disabling CRUSH for erasure code and doing custom placement

2014-07-23 Thread Gregory Farnum
On Tue, Jul 22, 2014 at 2:48 PM, Shayan Saeed shayansaee...@gmail.com wrote: Another question along the same lines. For erasure code, same as replicated files, the request goes through the primary member. Isn't it possible to send the request to any of the members and get the file. While this

Re: Disabling CRUSH for erasure code and doing custom placement

2014-07-22 Thread Shayan Saeed
Another question along the same lines. For erasure code, same as replicated files, the request goes through the primary member. Isn't it possible to send the request to any of the members and get the file. While this might have kept things neater on the development side and might have made some

Re: Disabling CRUSH for erasure code and doing custom placement

2014-07-18 Thread Kaifeng Yao
I was thinking about 'PG preferred' to allow binding a PG's placement to arbitrary OSDs. My angle is to make the PG more evenly distributed across OSDs, thus to potentially save ~20% cost. I am searching the 'pg preferred' implementation in CEPH to get more context. For the PG - OSD distribution

Re: Disabling CRUSH for erasure code and doing custom placement

2014-07-18 Thread Gregory Farnum
On Fri, Jul 18, 2014 at 2:12 AM, Kaifeng Yao kaif...@yahoo-inc.com wrote: I was thinking about 'PG preferred' to allow binding a PG's placement to arbitrary OSDs. My angle is to make the PG more evenly distributed across OSDs, thus to potentially save ~20% cost. I am searching the 'pg

Re: Disabling CRUSH for erasure code and doing custom placement

2014-07-15 Thread Shayan Saeed
Well I did end up putting the data in different pools for custom placement. However, I run into trouble during retrieval. The messy way is to query every pool to check where the data is stored. This requires many round trips to machines in the far off racks. Is it possible this information is

Re: Disabling CRUSH for erasure code and doing custom placement

2014-07-15 Thread Gregory Farnum
One of Ceph's design tentpoles is *avoiding* a central metadata lookup table. The Ceph MDS maintains a filesystem hierarchy but doesn't really handle the sort of thing you're talking about, either. If you want some kind of lookup, you'll need to build it yourself — although you could make use of

Disabling CRUSH for erasure code and doing custom placement

2014-06-24 Thread Shayan Saeed
Hi, CRUSH placement algorithm works really nice with replication. However, with erasure code, my cluster has some issues which require making changes that I cannot specify with CRUSH maps. Sometimes, depending on the type of data, I would like to place them on different OSDs but in the same pool.

Re: Disabling CRUSH for erasure code and doing custom placement

2014-06-24 Thread Gregory Farnum
On Tue, Jun 24, 2014 at 8:29 AM, Shayan Saeed shayansaee...@gmail.com wrote: Hi, CRUSH placement algorithm works really nice with replication. However, with erasure code, my cluster has some issues which require making changes that I cannot specify with CRUSH maps. Sometimes, depending on

Re: Disabling CRUSH for erasure code and doing custom placement

2014-06-24 Thread Shayan Saeed
I assumed that creating a large number of pools might not be scalable. If there is no overhead in creating as many pools as I want within an OSD, I would probably choose this option. I just want to specify that systematic chunks should be among 'a' racks while distribute others among 'b' racks.

Re: Disabling CRUSH for erasure code and doing custom placement

2014-06-24 Thread Gregory Farnum
On Tue, Jun 24, 2014 at 9:12 AM, Shayan Saeed shayansaee...@gmail.com wrote: I assumed that creating a large number of pools might not be scalable. If there is no overhead in creating as many pools as I want within an OSD, I would probably choose this option. There is an overhead per-PG, and

Re: Disabling CRUSH for erasure code and doing custom placement

2014-06-24 Thread Sage Weil
I wonder if what *would* make some sense here would be to add an exception map to OSDMap similar to pg_temp, but called pg_force (or similar) that is a persistent, forced mapping of a pg to a value. This would, in principle, let you force a mapping for every pg and have no (or an empty) CRUSH