RE: newstore direction

2015-10-20 Thread Dałek , Piotr
> -Original Message- > From: ceph-devel-ow...@vger.kernel.org [mailto:ceph-devel- > ow...@vger.kernel.org] On Behalf Of Sage Weil > Sent: Monday, October 19, 2015 9:49 PM > > The current design is based on two simple ideas: > > 1) a key/value interface is better way to manage all of our

Write performance issue under rocksdb kvstore

2015-10-20 Thread Z Zhang
Hi Guys, I am trying latest ceph-9.1.0 with rocksdb 4.1 and ceph-9.0.3 with rocksdb 3.11 as OSD backend. I use rbd to test performance and following is my cluster info. [ceph@xxx ~]$ ceph -s     cluster b74f3944-d77f-4401-a531-fa5282995808      health HEALTH_OK      monmap e1: 1 mons at

Re: newstore direction

2015-10-20 Thread Sage Weil
On Tue, 20 Oct 2015, Haomai Wang wrote: > On Tue, Oct 20, 2015 at 3:49 AM, Sage Weil wrote: > > The current design is based on two simple ideas: > > > > 1) a key/value interface is better way to manage all of our internal > > metadata (object metadata, attrs, layout, collection

RE: newstore direction

2015-10-20 Thread Sage Weil
On Tue, 20 Oct 2015, Chen, Xiaoxi wrote: > +1, nowadays K-V DB care more about very small key-value pairs, say > several bytes to a few KB, but in SSD case we only care about 4KB or > 8KB. In this way, NVMKV is a good design and seems some of the SSD > vendor are also trying to build this kind

RE: newstore direction

2015-10-20 Thread Sage Weil
On Mon, 19 Oct 2015, James (Fei) Liu-SSI wrote: > Hi Sage and Somnath, > In my humble opinion, There is another more aggressive solution than > raw block device base keyvalue store as backend for objectstore. The new > key value SSD device with transaction support would be ideal to solve >

Re: Write performance issue under rocksdb kvstore

2015-10-20 Thread Sage Weil
On Tue, 20 Oct 2015, Z Zhang wrote: > Hi Guys, > > I am trying latest ceph-9.1.0 with rocksdb 4.1 and ceph-9.0.3 with > rocksdb 3.11 as OSD backend. I use rbd to test performance and following > is my cluster info. > > [ceph@xxx ~]$ ceph -s >     cluster b74f3944-d77f-4401-a531-fa5282995808 >  

[no subject]

2015-10-20 Thread maillist_linux
subscribe ceph-devel -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: a problem about ReplicatedBackend::start_pushes

2015-10-20 Thread Sage Weil
On Tue, 20 Oct 2015, yangruifeng.09...@h3c.com wrote: > Hi Sage? > > In the following function, we can ensure is_missing(soid) is equal with > is_missing(soid, primary_have_version) ? > > int ReplicatedBackend::start_pushes( > const hobject_t , > ObjectContextRef obc, > RPGHandle *h) > { >

Re: newstore direction

2015-10-20 Thread Ric Wheeler
On 10/19/2015 03:49 PM, Sage Weil wrote: The current design is based on two simple ideas: 1) a key/value interface is better way to manage all of our internal metadata (object metadata, attrs, layout, collection membership, write-ahead logging, overlay data, etc.) 2) a file system is well

Re: newstore direction

2015-10-20 Thread kernel neophyte
On Tue, Oct 20, 2015 at 6:19 AM, Mark Nelson wrote: > On 10/20/2015 07:30 AM, Sage Weil wrote: >> >> On Tue, 20 Oct 2015, Chen, Xiaoxi wrote: >>> >>> +1, nowadays K-V DB care more about very small key-value pairs, say >>> several bytes to a few KB, but in SSD case we only care

Re: newstore direction

2015-10-20 Thread Sage Weil
On Tue, 20 Oct 2015, Ric Wheeler wrote: > On 10/19/2015 03:49 PM, Sage Weil wrote: > > The current design is based on two simple ideas: > > > > 1) a key/value interface is better way to manage all of our internal > > metadata (object metadata, attrs, layout, collection membership, > >

Re: newstore direction

2015-10-20 Thread Martin Millnert
Adding to this, On Tue, 2015-10-20 at 05:34 -0700, Sage Weil wrote: > On Mon, 19 Oct 2015, James (Fei) Liu-SSI wrote: > > Hi Sage and Somnath, > > In my humble opinion, There is another more aggressive solution than > > raw block device base keyvalue store as backend for objectstore. The new

Re: newstore direction

2015-10-20 Thread Gregory Farnum
On Tue, Oct 20, 2015 at 12:44 PM, Sage Weil wrote: > On Tue, 20 Oct 2015, Ric Wheeler wrote: >> The big problem with consuming block devices directly is that you ultimately >> end up recreating most of the features that you had in the file system. Even >> enterprise databases

Re: newstore direction

2015-10-20 Thread Ric Wheeler
On 10/20/2015 03:44 PM, Sage Weil wrote: On Tue, 20 Oct 2015, Ric Wheeler wrote: On 10/19/2015 03:49 PM, Sage Weil wrote: The current design is based on two simple ideas: 1) a key/value interface is better way to manage all of our internal metadata (object metadata, attrs, layout,

Re: newstore direction

2015-10-20 Thread Sage Weil
On Tue, 20 Oct 2015, Gregory Farnum wrote: > On Tue, Oct 20, 2015 at 12:44 PM, Sage Weil wrote: > > On Tue, 20 Oct 2015, Ric Wheeler wrote: > >> The big problem with consuming block devices directly is that you > >> ultimately > >> end up recreating most of the features that

Re: newstore direction

2015-10-20 Thread Sage Weil
On Tue, 20 Oct 2015, John Spray wrote: > On Mon, Oct 19, 2015 at 8:49 PM, Sage Weil wrote: > > - We have to size the kv backend storage (probably still an XFS > > partition) vs the block storage. Maybe we do this anyway (put metadata on > > SSD!) so it won't matter. But what

Re: newstore direction

2015-10-20 Thread Yehuda Sadeh-Weinraub
On Tue, Oct 20, 2015 at 11:31 AM, Ric Wheeler wrote: > On 10/19/2015 03:49 PM, Sage Weil wrote: >> >> The current design is based on two simple ideas: >> >> 1) a key/value interface is better way to manage all of our internal >> metadata (object metadata, attrs, layout,

RE: newstore direction

2015-10-20 Thread James (Fei) Liu-SSI
Varada, Hopefully , It will answer yours question too. It is going to be new type of key value device than traditional hard drive based OSD device. It will have its own storage stack than traditional block based storage stack. I have to admit it is a little bit more aggressive than block based

RE: newstore direction

2015-10-20 Thread Sage Weil
On Tue, 20 Oct 2015, James (Fei) Liu-SSI wrote: > Hi Sage, >Sorry for confusing you. SSDs with key value interfaces are still > under development by several vendors. It has totally different design > approach than Open Channel SSD. I met Matias several months ago and > discussed about

Re: newstore direction

2015-10-20 Thread Matt Benjamin
We mostly assumed that sort-of transactional file systems, perhaps hosted in user space was the most tractable trajectory. I have seen newstore and keyvalue store as essentially congruent approaches using database primitives (and I am interested in what you make of Russell Sears). I'm

Re: newstore direction

2015-10-20 Thread Ric Wheeler
On 10/20/2015 05:47 PM, Sage Weil wrote: On Tue, 20 Oct 2015, Gregory Farnum wrote: On Tue, Oct 20, 2015 at 12:44 PM, Sage Weil wrote: On Tue, 20 Oct 2015, Ric Wheeler wrote: The big problem with consuming block devices directly is that you ultimately end up recreating most

Package requirements for upcoming Infernalis Repository

2015-10-20 Thread Alfredo Deza
Hi all, I am in the process of getting the new Infernalis repository and going over the notes I had when the Hammer repository was created includes a few packages that didn't come from a Ceph build and had to be added manually. I don't know which of these packages are still needed for

Re: [ceph-users] Write performance issue under rocksdb kvstore

2015-10-20 Thread Haomai Wang
On Tue, Oct 20, 2015 at 8:47 PM, Sage Weil wrote: > On Tue, 20 Oct 2015, Z Zhang wrote: >> Hi Guys, >> >> I am trying latest ceph-9.1.0 with rocksdb 4.1 and ceph-9.0.3 with >> rocksdb 3.11 as OSD backend. I use rbd to test performance and following >> is my cluster info. >> >>

Re: newstore direction

2015-10-20 Thread Mark Nelson
On 10/20/2015 07:30 AM, Sage Weil wrote: On Tue, 20 Oct 2015, Chen, Xiaoxi wrote: +1, nowadays K-V DB care more about very small key-value pairs, say several bytes to a few KB, but in SSD case we only care about 4KB or 8KB. In this way, NVMKV is a good design and seems some of the SSD vendor

Re: Package requirements for upcoming Infernalis Repository

2015-10-20 Thread Sage Weil
On Tue, 20 Oct 2015, Alfredo Deza wrote: > Hi all, > > I am in the process of getting the new Infernalis repository and going > over the notes I had when > the Hammer repository was created includes a few packages that didn't > come from a Ceph build and had to be added manually. > > I don't

RE: [ceph-users] Write performance issue under rocksdb kvstore

2015-10-20 Thread Sage Weil
On Tue, 20 Oct 2015, Z Zhang wrote: > Thanks, Sage, for pointing out the PR and ceph branch. I will take a > closer look. > > Yes, I am trying KVStore backend. The reason we are trying it is that > few user doesn't have such high requirement on data loss occasionally. > It seems KVStore

Re: [ceph-users] Write performance issue under rocksdb kvstore

2015-10-20 Thread Haomai Wang
Actually keyvaluestore would submit transaction with sync flag too(rely to keyvaluedb impl journal/logfile). Yes, if we disable sync flag, keyvaluestore's performance will increase a lot. But we dont provide with this option now On Tue, Oct 20, 2015 at 9:22 PM, Z Zhang