I've mentioned this a couple times in discussions recently as well. We were discussing the concept of infinite retention for a certain type of service, and how it might be accomplished. My suggestion was to have a combination of storage types and the ability for Kafka to look for segments in two different directory structures. This way you could expand the backend storage as needed (which could be on an external storage appliance) while still maintaining performance for recent segments.
I still think this is something worth pursuing at some point, and it should be relatively easy to implement within the broker. -Todd On Wed, Oct 22, 2014 at 11:53 PM, Neil Harkins <nhark...@gmail.com> wrote: > I've been thinking about this recently. > If kafka provided cmdline hooks to be executed on segment rotation, > similar to postgres' wal 'archive_command', configurations could store > only the current segments and all their random i/o on flash, then once > rotated, copy them sequentially onto larger/slower spinning disks, > or even S3. > > -neil > > On Wed, Oct 22, 2014 at 10:09 PM, Xiaobin She <xiaobin...@gmail.com> > wrote: > > Todd, > > > > Thank you for the information. > > > > With 28,000+ files and 14 disks, that makes there are averagely about > 4000 > > open files on two disk ( which is treated as one single disk) , am I > right? > > > > How do you manage to make the all the write operation to thest 4000 open > > files be sequential to the disk? > > > > As far as I know, write operation to different files on the same disk > will > > cause random write, which is not good for performance. > > > > xiaobinshe > > > > > > > > > > 2014-10-23 1:00 GMT+08:00 Todd Palino <tpal...@gmail.com>: > > > >> In fact there are many more than 4000 open files. Many of our brokers > run > >> with 28,000+ open files (regular file handles, not network > connections). In > >> our case, we're beefing up the disk performance as much as we can by > >> running in a RAID-10 configuration with 14 disks. > >> > >> -Todd > >> > >> On Tue, Oct 21, 2014 at 7:58 PM, Xiaobin She <xiaobin...@gmail.com> > wrote: > >> > >> > Todd, > >> > > >> > Actually I'm wondering how kafka handle so much partition, with one > >> > partition there is at least one file on disk, and with 4000 partition, > >> > there will be at least 4000 files. > >> > > >> > When all these partitions have write request, how did Kafka make the > >> write > >> > operation on the disk to be sequential (which is emphasized in the > design > >> > document of Kafka) and make sure the disk access is effective? > >> > > >> > Thank you for your reply. > >> > > >> > xiaobinshe > >> > > >> > > >> > > >> > 2014-10-22 5:10 GMT+08:00 Todd Palino <tpal...@gmail.com>: > >> > > >> > > As far as the number of partitions a single broker can handle, we've > >> set > >> > > our cap at 4000 partitions (including replicas). Above that we've > seen > >> > some > >> > > performance and stability issues. > >> > > > >> > > -Todd > >> > > > >> > > On Tue, Oct 21, 2014 at 12:15 AM, Xiaobin She <xiaobin...@gmail.com > > > >> > > wrote: > >> > > > >> > > > hello, everyone > >> > > > > >> > > > I'm new to kafka, I'm wondering what's the max num of partition > can > >> one > >> > > > siggle machine handle in Kafka? > >> > > > > >> > > > Is there an sugeest num? > >> > > > > >> > > > Thanks. > >> > > > > >> > > > xiaobinshe > >> > > > > >> > > > >> > > >> >