Hey, Jay, 

Thank you for your answer! 

Based on my understanding, Kafka fsync is regularly issued by a dedicated 
helper thread.  It is not issued based on the semantics. The producers are 
unable to issue a COMMIT to trigger fsync. 

Not sure if this requirement is highly desirable to the others too? 

Night, 

Xiao Li

On Mar 4, 2015, at 9:00 AM, Jay Kreps <jay.kr...@gmail.com> wrote:

> Hey Xiao,
> 
> Yeah I agree that without fsync you will not get durability in the case of
> a power outage or other correlated failure, and likewise without
> replication you won't get durability in the case of disk failure.
> 
> If each batch is fsync'd it will definitely be slower, depending on the
> capability of the disk subsystem. Either way that feature is there now.
> 
> -Jay
> 
> On Wed, Mar 4, 2015 at 8:50 AM, Xiao <lixiao1...@gmail.com> wrote:
> 
>> Hey Jay,
>> 
>> Yeah. I understood the advantage of Kafka is one to many. That is why I am
>> reading the source codes of Kafka. Your guys did a good product! : )
>> 
>> Our major concern is its message persistency. Zero data loss is a must in
>> our applications. Below is what I copied from the Kafka document.
>> 
>> "The log takes two configuration parameter M which gives the number of
>> messages to write before forcing the OS to flush the file to disk, and S
>> which gives a number of seconds after which a flush is forced. This gives a
>> durability guarantee of losing at most M messages or S seconds of data in
>> the event of a system crash."
>> 
>> Basically, our producers needs to know if the data have been
>> flushed/fsynced to the disk. Our model is disconnected. Producers and
>> consumers do not talk with each other. The only media is a Kafka-like
>> persistence message queue.
>> 
>> Unplanned power outage is not rare in 24/7 usage. Any data loss could
>> cause a very expensive full refresh. That is not acceptable for many
>> financial companies.
>> 
>> If we do fsync for each transaction or each batch, the throughput could be
>> low? Or another way is to let our producers check recovery points very
>> frequently, and then the performance bottleneck will be on reading/copying
>> the recovery-point file. Any other ideas?
>> 
>> I have not read the source codes for synchronous disk replication. That
>> will be my next focus. I am not sure if that can resolve our above concern.
>> 
>> BTW, do you have any plan to support mainframe?
>> 
>> Thanks,
>> 
>> Xiao Li
>> 
>> 
>> On Mar 4, 2015, at 8:01 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>> 
>>> Hey Xiao,
>>> 
>>> 1. Nothing prevents applying transactions transactionally on the
>>> destination side, though that is obviously more work. But I think the key
>>> point here is that much of the time the replication is not
>> Oracle=>Oracle,
>>> but Oracle=>{W, X, Y, Z} where W/X/Y/Z are totally heterogenous systems
>>> that aren't necessarily RDBMSs.
>>> 
>>> 2. I don't think fsync is really relevant. You can fsync on every message
>>> if you like, but Kafka's durability guarantees don't depend on this as it
>>> allows synchronous commit across replicas. This changes the guarantee
>> from
>>> "won't be lost unless the disk dies" to "won't be lost unless all
>> replicas
>>> die" but the later is generally a stronger guarantee in practice given
>> the
>>> empirical reliability of disks (#1 reason for server failure in my
>>> experience was disk failure).
>>> 
>>> -Jay
>>> 
>>> On Tue, Mar 3, 2015 at 4:23 PM, Xiao <lixiao1...@gmail.com> wrote:
>>> 
>>>> Hey Josh,
>>>> 
>>>> If you put different tables into different partitions or topics, it
>> might
>>>> break transaction ACID at the target side. This is risky for some use
>>>> cases. Besides unit of work issues, you also need to think about the
>> load
>>>> balancing too.
>>>> 
>>>> For failover, you have to find the timestamp for point-in-time
>>>> consistency. This part is tricky. You have to ensure all the changes
>> before
>>>> a specific timestamp have been flushed to the disk. Normally, you can
>>>> maintain a bookmark for different partition at the target side to know
>> what
>>>> is the oldest transactions have been flushed to the disk. Unfortunately,
>>>> based on my understanding, Kafka is unable to do it because it does not
>> do
>>>> fsync regularly for achieving better throughput.
>>>> 
>>>> Best wishes,
>>>> 
>>>> Xiao Li
>>>> 
>>>> 
>>>> On Mar 3, 2015, at 3:45 PM, Xiao <lixiao1...@gmail.com> wrote:
>>>> 
>>>>> Hey Josh,
>>>>> 
>>>>> Transactions can be applied in parallel in the consumer side based on
>>>> transaction dependency checking.
>>>>> 
>>>>> http://www.google.com.ar/patents/US20080163222
>>>>> 
>>>>> This patent documents how it work. It is easy to understand, however,
>>>> you also need to consider the hash collision issues. This has been
>>>> implemented in IBM Q Replication since 2001.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Xiao Li
>>>>> 
>>>>> 
>>>>> On Mar 3, 2015, at 3:36 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
>>>>> 
>>>>>> Hey Josh,
>>>>>> 
>>>>>> As you say, ordering is per partition. Technically it is generally
>>>> possible
>>>>>> to publish all changes to a database to a single partition--generally
>>>> the
>>>>>> kafka partition should be high throughput enough to keep up. However
>>>> there
>>>>>> are a couple of downsides to this:
>>>>>> 1. Consumer parallelism is limited to one. If you want a total order
>> to
>>>> the
>>>>>> consumption of messages you need to have just 1 process, but often you
>>>>>> would want to parallelize.
>>>>>> 2. Often what people want is not a full stream of all changes in all
>>>> tables
>>>>>> in a database but rather the changes to a particular table.
>>>>>> 
>>>>>> To some extent the best way to do this depends on what you will do
>> with
>>>> the
>>>>>> data. However if you intend to have lots
>>>>>> 
>>>>>> I have seen pretty much every variation on this in the wild, and here
>> is
>>>>>> what I would recommend:
>>>>>> 1. Have a single publisher process that publishes events into Kafka
>>>>>> 2. If possible use the database log to get these changes (e.g. mysql
>>>>>> binlog, Oracle xstreams, golden gate, etc). This will be more complete
>>>> and
>>>>>> more efficient than polling for changes, though that can work too.
>>>>>> 3. Publish each table to its own topic.
>>>>>> 4. Partition each topic by the primary key of the table.
>>>>>> 5. Include in each message the database's transaction id, scn, or
>> other
>>>>>> identifier that gives the total order within the record stream. Since
>>>> there
>>>>>> is a single publisher this id will be monotonic within each partition.
>>>>>> 
>>>>>> This seems to be the best set of tradeoffs for most use cases:
>>>>>> - You can have parallel consumers up to the number of partitions you
>>>> chose
>>>>>> that still get messages in order per ID'd entity.
>>>>>> - You can subscribe to just one table if you like, or to multiple
>>>> tables.
>>>>>> - Consumers who need a total order over all updates can do a "merge"
>>>> across
>>>>>> the partitions to reassemble the fully ordered set of changes across
>> all
>>>>>> tables/partitions.
>>>>>> 
>>>>>> One thing to note is that the requirement of having a single consumer
>>>>>> process/thread to get the total order isn't really so much a Kafka
>>>>>> restriction as it just is a restriction about the world, since if you
>>>> had
>>>>>> multiple threads even if you delivered messages to them in order their
>>>>>> processing might happen out of order (just do to the random timing of
>>>> the
>>>>>> processing).
>>>>>> 
>>>>>> -Jay
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Tue, Mar 3, 2015 at 3:15 PM, Josh Rader <jrader...@gmail.com>
>> wrote:
>>>>>> 
>>>>>>> Hi Kafka Experts,
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> We have a use case around RDBMS replication where we are
>> investigating
>>>>>>> Kafka.  In this case ordering is very important.  Our understanding
>> is
>>>>>>> ordering is only preserved within a single partition.  This makes
>>>> sense as
>>>>>>> a single thread will consume these messages, but our question is can
>> we
>>>>>>> somehow parallelize this for better performance?   Is there maybe
>> some
>>>>>>> partition key strategy trick to have your cake and eat it too in
>> terms
>>>> of
>>>>>>> keeping ordering, but also able to parallelize the processing?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I am sorry if this has already been asked, but we tried to search
>>>> through
>>>>>>> the archives and couldn’t find this response.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> Josh
>>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to