Hey Scott, There's not much you can do about this, other than increasing your log.segment.bytes (max 2GB) or lowering your partition counts on your mirrored cluster (probably not the best strategy unless you're dealing with 100,000+ small topics, at which point you should consider a single aggregate topic + key based partitioning instead). Kafka will keep a file handle open per log segment file and another per associated log segment index file. Your best bet is to permanently increase allowed open file descriptors, and run your log directory on XFS or other appropriate filesystem.
Linux uses ~1KB of memory per open file descriptor. So, 100,000 handles = ~100MB. 100-200k open descriptors is not unusual for certain enterprisey apps, and 10-30k seems to be the norm if each of your brokers has a decent amount of disk space. A general recommendation exists to keep your file descriptor memory usage to 10% or lower than your total system memory, but this may be a bit arbitrary. The general thinking is that under normal use cases, an application should run into other resource constraints before any file descriptor constraint becomes an issue. If this is not true, you might consider re-evaluating your current design such that your partition count scales with your hardware resources & number of consumers rather than your external software design. As mentioned above, aggregate topics using key-based partitioning can help with this. Regards, On Wed, Jun 3, 2015 at 7:47 AM, Scott Thibault < [email protected]> wrote: > Hi, > > I'm running into the common issue of too many files open by the broker. > While increasing the open file limit is a short-term work around, I need a > long-term solution. I have a Kafka mirror that is keeping log segments for > long periods of time and the number of files is potentially unbounded. > > Is there some way to prevent the broker from holding an open descriptor for > every file? > > --Scott Thibault > > -- > *This e-mail is not encrypted. Due to the unsecured nature of unencrypted > e-mail, there may be some level of risk that the information in this e-mail > could be read by a third party. Accordingly, the recipient(s) named above > are hereby advised to not communicate protected health information using > this e-mail address. If you desire to send protected health information > electronically, please contact MultiScale Health Networks at (206)538-6090 > * >
