Hey, Jay, Thank you for your answer!
Based on my understanding, Kafka fsync is regularly issued by a dedicated helper thread. It is not issued based on the semantics. The producers are unable to issue a COMMIT to trigger fsync. Not sure if this requirement is highly desirable to the others too? Night, Xiao Li On Mar 4, 2015, at 9:00 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > Hey Xiao, > > Yeah I agree that without fsync you will not get durability in the case of > a power outage or other correlated failure, and likewise without > replication you won't get durability in the case of disk failure. > > If each batch is fsync'd it will definitely be slower, depending on the > capability of the disk subsystem. Either way that feature is there now. > > -Jay > > On Wed, Mar 4, 2015 at 8:50 AM, Xiao <lixiao1...@gmail.com> wrote: > >> Hey Jay, >> >> Yeah. I understood the advantage of Kafka is one to many. That is why I am >> reading the source codes of Kafka. Your guys did a good product! : ) >> >> Our major concern is its message persistency. Zero data loss is a must in >> our applications. Below is what I copied from the Kafka document. >> >> "The log takes two configuration parameter M which gives the number of >> messages to write before forcing the OS to flush the file to disk, and S >> which gives a number of seconds after which a flush is forced. This gives a >> durability guarantee of losing at most M messages or S seconds of data in >> the event of a system crash." >> >> Basically, our producers needs to know if the data have been >> flushed/fsynced to the disk. Our model is disconnected. Producers and >> consumers do not talk with each other. The only media is a Kafka-like >> persistence message queue. >> >> Unplanned power outage is not rare in 24/7 usage. Any data loss could >> cause a very expensive full refresh. That is not acceptable for many >> financial companies. >> >> If we do fsync for each transaction or each batch, the throughput could be >> low? Or another way is to let our producers check recovery points very >> frequently, and then the performance bottleneck will be on reading/copying >> the recovery-point file. Any other ideas? >> >> I have not read the source codes for synchronous disk replication. That >> will be my next focus. I am not sure if that can resolve our above concern. >> >> BTW, do you have any plan to support mainframe? >> >> Thanks, >> >> Xiao Li >> >> >> On Mar 4, 2015, at 8:01 AM, Jay Kreps <jay.kr...@gmail.com> wrote: >> >>> Hey Xiao, >>> >>> 1. Nothing prevents applying transactions transactionally on the >>> destination side, though that is obviously more work. But I think the key >>> point here is that much of the time the replication is not >> Oracle=>Oracle, >>> but Oracle=>{W, X, Y, Z} where W/X/Y/Z are totally heterogenous systems >>> that aren't necessarily RDBMSs. >>> >>> 2. I don't think fsync is really relevant. You can fsync on every message >>> if you like, but Kafka's durability guarantees don't depend on this as it >>> allows synchronous commit across replicas. This changes the guarantee >> from >>> "won't be lost unless the disk dies" to "won't be lost unless all >> replicas >>> die" but the later is generally a stronger guarantee in practice given >> the >>> empirical reliability of disks (#1 reason for server failure in my >>> experience was disk failure). >>> >>> -Jay >>> >>> On Tue, Mar 3, 2015 at 4:23 PM, Xiao <lixiao1...@gmail.com> wrote: >>> >>>> Hey Josh, >>>> >>>> If you put different tables into different partitions or topics, it >> might >>>> break transaction ACID at the target side. This is risky for some use >>>> cases. Besides unit of work issues, you also need to think about the >> load >>>> balancing too. >>>> >>>> For failover, you have to find the timestamp for point-in-time >>>> consistency. This part is tricky. You have to ensure all the changes >> before >>>> a specific timestamp have been flushed to the disk. Normally, you can >>>> maintain a bookmark for different partition at the target side to know >> what >>>> is the oldest transactions have been flushed to the disk. Unfortunately, >>>> based on my understanding, Kafka is unable to do it because it does not >> do >>>> fsync regularly for achieving better throughput. >>>> >>>> Best wishes, >>>> >>>> Xiao Li >>>> >>>> >>>> On Mar 3, 2015, at 3:45 PM, Xiao <lixiao1...@gmail.com> wrote: >>>> >>>>> Hey Josh, >>>>> >>>>> Transactions can be applied in parallel in the consumer side based on >>>> transaction dependency checking. >>>>> >>>>> http://www.google.com.ar/patents/US20080163222 >>>>> >>>>> This patent documents how it work. It is easy to understand, however, >>>> you also need to consider the hash collision issues. This has been >>>> implemented in IBM Q Replication since 2001. >>>>> >>>>> Thanks, >>>>> >>>>> Xiao Li >>>>> >>>>> >>>>> On Mar 3, 2015, at 3:36 PM, Jay Kreps <jay.kr...@gmail.com> wrote: >>>>> >>>>>> Hey Josh, >>>>>> >>>>>> As you say, ordering is per partition. Technically it is generally >>>> possible >>>>>> to publish all changes to a database to a single partition--generally >>>> the >>>>>> kafka partition should be high throughput enough to keep up. However >>>> there >>>>>> are a couple of downsides to this: >>>>>> 1. Consumer parallelism is limited to one. If you want a total order >> to >>>> the >>>>>> consumption of messages you need to have just 1 process, but often you >>>>>> would want to parallelize. >>>>>> 2. Often what people want is not a full stream of all changes in all >>>> tables >>>>>> in a database but rather the changes to a particular table. >>>>>> >>>>>> To some extent the best way to do this depends on what you will do >> with >>>> the >>>>>> data. However if you intend to have lots >>>>>> >>>>>> I have seen pretty much every variation on this in the wild, and here >> is >>>>>> what I would recommend: >>>>>> 1. Have a single publisher process that publishes events into Kafka >>>>>> 2. If possible use the database log to get these changes (e.g. mysql >>>>>> binlog, Oracle xstreams, golden gate, etc). This will be more complete >>>> and >>>>>> more efficient than polling for changes, though that can work too. >>>>>> 3. Publish each table to its own topic. >>>>>> 4. Partition each topic by the primary key of the table. >>>>>> 5. Include in each message the database's transaction id, scn, or >> other >>>>>> identifier that gives the total order within the record stream. Since >>>> there >>>>>> is a single publisher this id will be monotonic within each partition. >>>>>> >>>>>> This seems to be the best set of tradeoffs for most use cases: >>>>>> - You can have parallel consumers up to the number of partitions you >>>> chose >>>>>> that still get messages in order per ID'd entity. >>>>>> - You can subscribe to just one table if you like, or to multiple >>>> tables. >>>>>> - Consumers who need a total order over all updates can do a "merge" >>>> across >>>>>> the partitions to reassemble the fully ordered set of changes across >> all >>>>>> tables/partitions. >>>>>> >>>>>> One thing to note is that the requirement of having a single consumer >>>>>> process/thread to get the total order isn't really so much a Kafka >>>>>> restriction as it just is a restriction about the world, since if you >>>> had >>>>>> multiple threads even if you delivered messages to them in order their >>>>>> processing might happen out of order (just do to the random timing of >>>> the >>>>>> processing). >>>>>> >>>>>> -Jay >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Mar 3, 2015 at 3:15 PM, Josh Rader <jrader...@gmail.com> >> wrote: >>>>>> >>>>>>> Hi Kafka Experts, >>>>>>> >>>>>>> >>>>>>> >>>>>>> We have a use case around RDBMS replication where we are >> investigating >>>>>>> Kafka. In this case ordering is very important. Our understanding >> is >>>>>>> ordering is only preserved within a single partition. This makes >>>> sense as >>>>>>> a single thread will consume these messages, but our question is can >> we >>>>>>> somehow parallelize this for better performance? Is there maybe >> some >>>>>>> partition key strategy trick to have your cake and eat it too in >> terms >>>> of >>>>>>> keeping ordering, but also able to parallelize the processing? >>>>>>> >>>>>>> >>>>>>> >>>>>>> I am sorry if this has already been asked, but we tried to search >>>> through >>>>>>> the archives and couldn’t find this response. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Josh >>>>>>> >>>>> >>>> >>>> >> >>