Hey Xiao, That's not quite right. Fsync is controlled by either a time based criteria (flush every 30 seconds) or a number of messages criteria. So if you set the number of messages to 1 the flush is synchronous with the write, which I think is what you are looking for.
-Jay On Thu, Mar 5, 2015 at 1:17 AM, Xiao <lixiao1...@gmail.com> wrote: > Hey, Jay, > > Thank you for your answer! > > Based on my understanding, Kafka fsync is regularly issued by a dedicated > helper thread. It is not issued based on the semantics. The producers are > unable to issue a COMMIT to trigger fsync. > > Not sure if this requirement is highly desirable to the others too? > > Night, > > Xiao Li > > On Mar 4, 2015, at 9:00 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > Hey Xiao, > > > > Yeah I agree that without fsync you will not get durability in the case > of > > a power outage or other correlated failure, and likewise without > > replication you won't get durability in the case of disk failure. > > > > If each batch is fsync'd it will definitely be slower, depending on the > > capability of the disk subsystem. Either way that feature is there now. > > > > -Jay > > > > On Wed, Mar 4, 2015 at 8:50 AM, Xiao <lixiao1...@gmail.com> wrote: > > > >> Hey Jay, > >> > >> Yeah. I understood the advantage of Kafka is one to many. That is why I > am > >> reading the source codes of Kafka. Your guys did a good product! : ) > >> > >> Our major concern is its message persistency. Zero data loss is a must > in > >> our applications. Below is what I copied from the Kafka document. > >> > >> "The log takes two configuration parameter M which gives the number of > >> messages to write before forcing the OS to flush the file to disk, and S > >> which gives a number of seconds after which a flush is forced. This > gives a > >> durability guarantee of losing at most M messages or S seconds of data > in > >> the event of a system crash." > >> > >> Basically, our producers needs to know if the data have been > >> flushed/fsynced to the disk. Our model is disconnected. Producers and > >> consumers do not talk with each other. The only media is a Kafka-like > >> persistence message queue. > >> > >> Unplanned power outage is not rare in 24/7 usage. Any data loss could > >> cause a very expensive full refresh. That is not acceptable for many > >> financial companies. > >> > >> If we do fsync for each transaction or each batch, the throughput could > be > >> low? Or another way is to let our producers check recovery points very > >> frequently, and then the performance bottleneck will be on > reading/copying > >> the recovery-point file. Any other ideas? > >> > >> I have not read the source codes for synchronous disk replication. That > >> will be my next focus. I am not sure if that can resolve our above > concern. > >> > >> BTW, do you have any plan to support mainframe? > >> > >> Thanks, > >> > >> Xiao Li > >> > >> > >> On Mar 4, 2015, at 8:01 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> > >>> Hey Xiao, > >>> > >>> 1. Nothing prevents applying transactions transactionally on the > >>> destination side, though that is obviously more work. But I think the > key > >>> point here is that much of the time the replication is not > >> Oracle=>Oracle, > >>> but Oracle=>{W, X, Y, Z} where W/X/Y/Z are totally heterogenous systems > >>> that aren't necessarily RDBMSs. > >>> > >>> 2. I don't think fsync is really relevant. You can fsync on every > message > >>> if you like, but Kafka's durability guarantees don't depend on this as > it > >>> allows synchronous commit across replicas. This changes the guarantee > >> from > >>> "won't be lost unless the disk dies" to "won't be lost unless all > >> replicas > >>> die" but the later is generally a stronger guarantee in practice given > >> the > >>> empirical reliability of disks (#1 reason for server failure in my > >>> experience was disk failure). > >>> > >>> -Jay > >>> > >>> On Tue, Mar 3, 2015 at 4:23 PM, Xiao <lixiao1...@gmail.com> wrote: > >>> > >>>> Hey Josh, > >>>> > >>>> If you put different tables into different partitions or topics, it > >> might > >>>> break transaction ACID at the target side. This is risky for some use > >>>> cases. Besides unit of work issues, you also need to think about the > >> load > >>>> balancing too. > >>>> > >>>> For failover, you have to find the timestamp for point-in-time > >>>> consistency. This part is tricky. You have to ensure all the changes > >> before > >>>> a specific timestamp have been flushed to the disk. Normally, you can > >>>> maintain a bookmark for different partition at the target side to know > >> what > >>>> is the oldest transactions have been flushed to the disk. > Unfortunately, > >>>> based on my understanding, Kafka is unable to do it because it does > not > >> do > >>>> fsync regularly for achieving better throughput. > >>>> > >>>> Best wishes, > >>>> > >>>> Xiao Li > >>>> > >>>> > >>>> On Mar 3, 2015, at 3:45 PM, Xiao <lixiao1...@gmail.com> wrote: > >>>> > >>>>> Hey Josh, > >>>>> > >>>>> Transactions can be applied in parallel in the consumer side based on > >>>> transaction dependency checking. > >>>>> > >>>>> http://www.google.com.ar/patents/US20080163222 > >>>>> > >>>>> This patent documents how it work. It is easy to understand, however, > >>>> you also need to consider the hash collision issues. This has been > >>>> implemented in IBM Q Replication since 2001. > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Xiao Li > >>>>> > >>>>> > >>>>> On Mar 3, 2015, at 3:36 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > >>>>> > >>>>>> Hey Josh, > >>>>>> > >>>>>> As you say, ordering is per partition. Technically it is generally > >>>> possible > >>>>>> to publish all changes to a database to a single > partition--generally > >>>> the > >>>>>> kafka partition should be high throughput enough to keep up. However > >>>> there > >>>>>> are a couple of downsides to this: > >>>>>> 1. Consumer parallelism is limited to one. If you want a total order > >> to > >>>> the > >>>>>> consumption of messages you need to have just 1 process, but often > you > >>>>>> would want to parallelize. > >>>>>> 2. Often what people want is not a full stream of all changes in all > >>>> tables > >>>>>> in a database but rather the changes to a particular table. > >>>>>> > >>>>>> To some extent the best way to do this depends on what you will do > >> with > >>>> the > >>>>>> data. However if you intend to have lots > >>>>>> > >>>>>> I have seen pretty much every variation on this in the wild, and > here > >> is > >>>>>> what I would recommend: > >>>>>> 1. Have a single publisher process that publishes events into Kafka > >>>>>> 2. If possible use the database log to get these changes (e.g. mysql > >>>>>> binlog, Oracle xstreams, golden gate, etc). This will be more > complete > >>>> and > >>>>>> more efficient than polling for changes, though that can work too. > >>>>>> 3. Publish each table to its own topic. > >>>>>> 4. Partition each topic by the primary key of the table. > >>>>>> 5. Include in each message the database's transaction id, scn, or > >> other > >>>>>> identifier that gives the total order within the record stream. > Since > >>>> there > >>>>>> is a single publisher this id will be monotonic within each > partition. > >>>>>> > >>>>>> This seems to be the best set of tradeoffs for most use cases: > >>>>>> - You can have parallel consumers up to the number of partitions you > >>>> chose > >>>>>> that still get messages in order per ID'd entity. > >>>>>> - You can subscribe to just one table if you like, or to multiple > >>>> tables. > >>>>>> - Consumers who need a total order over all updates can do a "merge" > >>>> across > >>>>>> the partitions to reassemble the fully ordered set of changes across > >> all > >>>>>> tables/partitions. > >>>>>> > >>>>>> One thing to note is that the requirement of having a single > consumer > >>>>>> process/thread to get the total order isn't really so much a Kafka > >>>>>> restriction as it just is a restriction about the world, since if > you > >>>> had > >>>>>> multiple threads even if you delivered messages to them in order > their > >>>>>> processing might happen out of order (just do to the random timing > of > >>>> the > >>>>>> processing). > >>>>>> > >>>>>> -Jay > >>>>>> > >>>>>> > >>>>>> > >>>>>> On Tue, Mar 3, 2015 at 3:15 PM, Josh Rader <jrader...@gmail.com> > >> wrote: > >>>>>> > >>>>>>> Hi Kafka Experts, > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> We have a use case around RDBMS replication where we are > >> investigating > >>>>>>> Kafka. In this case ordering is very important. Our understanding > >> is > >>>>>>> ordering is only preserved within a single partition. This makes > >>>> sense as > >>>>>>> a single thread will consume these messages, but our question is > can > >> we > >>>>>>> somehow parallelize this for better performance? Is there maybe > >> some > >>>>>>> partition key strategy trick to have your cake and eat it too in > >> terms > >>>> of > >>>>>>> keeping ordering, but also able to parallelize the processing? > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> I am sorry if this has already been asked, but we tried to search > >>>> through > >>>>>>> the archives and couldn’t find this response. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Josh > >>>>>>> > >>>>> > >>>> > >>>> > >> > >> > >