Re: Database Replication Question

Jay Kreps Thu, 05 Mar 2015 12:04:47 -0800

Hey Xiao,

That's not quite right. Fsync is controlled by either a time based criteria
(flush every 30 seconds) or a number of messages criteria. So if you set
the number of messages to 1 the flush is synchronous with the write, which
I think is what you are looking for.


-Jay

On Thu, Mar 5, 2015 at 1:17 AM, Xiao <lixiao1...@gmail.com> wrote:

> Hey, Jay,
>
> Thank you for your answer!
>
> Based on my understanding, Kafka fsync is regularly issued by a dedicated
> helper thread.  It is not issued based on the semantics. The producers are
> unable to issue a COMMIT to trigger fsync.
>
> Not sure if this requirement is highly desirable to the others too?
>
> Night,
>
> Xiao Li
>
> On Mar 4, 2015, at 9:00 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>
> > Hey Xiao,
> >
> > Yeah I agree that without fsync you will not get durability in the case
> of
> > a power outage or other correlated failure, and likewise without
> > replication you won't get durability in the case of disk failure.
> >
> > If each batch is fsync'd it will definitely be slower, depending on the
> > capability of the disk subsystem. Either way that feature is there now.
> >
> > -Jay
> >
> > On Wed, Mar 4, 2015 at 8:50 AM, Xiao <lixiao1...@gmail.com> wrote:
> >
> >> Hey Jay,
> >>
> >> Yeah. I understood the advantage of Kafka is one to many. That is why I
> am
> >> reading the source codes of Kafka. Your guys did a good product! : )
> >>
> >> Our major concern is its message persistency. Zero data loss is a must
> in
> >> our applications. Below is what I copied from the Kafka document.
> >>
> >> "The log takes two configuration parameter M which gives the number of
> >> messages to write before forcing the OS to flush the file to disk, and S
> >> which gives a number of seconds after which a flush is forced. This
> gives a
> >> durability guarantee of losing at most M messages or S seconds of data
> in
> >> the event of a system crash."
> >>
> >> Basically, our producers needs to know if the data have been
> >> flushed/fsynced to the disk. Our model is disconnected. Producers and
> >> consumers do not talk with each other. The only media is a Kafka-like
> >> persistence message queue.
> >>
> >> Unplanned power outage is not rare in 24/7 usage. Any data loss could
> >> cause a very expensive full refresh. That is not acceptable for many
> >> financial companies.
> >>
> >> If we do fsync for each transaction or each batch, the throughput could
> be
> >> low? Or another way is to let our producers check recovery points very
> >> frequently, and then the performance bottleneck will be on
> reading/copying
> >> the recovery-point file. Any other ideas?
> >>
> >> I have not read the source codes for synchronous disk replication. That
> >> will be my next focus. I am not sure if that can resolve our above
> concern.
> >>
> >> BTW, do you have any plan to support mainframe?
> >>
> >> Thanks,
> >>
> >> Xiao Li
> >>
> >>
> >> On Mar 4, 2015, at 8:01 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >>
> >>> Hey Xiao,
> >>>
> >>> 1. Nothing prevents applying transactions transactionally on the
> >>> destination side, though that is obviously more work. But I think the
> key
> >>> point here is that much of the time the replication is not
> >> Oracle=>Oracle,
> >>> but Oracle=>{W, X, Y, Z} where W/X/Y/Z are totally heterogenous systems
> >>> that aren't necessarily RDBMSs.
> >>>
> >>> 2. I don't think fsync is really relevant. You can fsync on every
> message
> >>> if you like, but Kafka's durability guarantees don't depend on this as
> it
> >>> allows synchronous commit across replicas. This changes the guarantee
> >> from
> >>> "won't be lost unless the disk dies" to "won't be lost unless all
> >> replicas
> >>> die" but the later is generally a stronger guarantee in practice given
> >> the
> >>> empirical reliability of disks (#1 reason for server failure in my
> >>> experience was disk failure).
> >>>
> >>> -Jay
> >>>
> >>> On Tue, Mar 3, 2015 at 4:23 PM, Xiao <lixiao1...@gmail.com> wrote:
> >>>
> >>>> Hey Josh,
> >>>>
> >>>> If you put different tables into different partitions or topics, it
> >> might
> >>>> break transaction ACID at the target side. This is risky for some use
> >>>> cases. Besides unit of work issues, you also need to think about the
> >> load
> >>>> balancing too.
> >>>>
> >>>> For failover, you have to find the timestamp for point-in-time
> >>>> consistency. This part is tricky. You have to ensure all the changes
> >> before
> >>>> a specific timestamp have been flushed to the disk. Normally, you can
> >>>> maintain a bookmark for different partition at the target side to know
> >> what
> >>>> is the oldest transactions have been flushed to the disk.
> Unfortunately,
> >>>> based on my understanding, Kafka is unable to do it because it does
> not
> >> do
> >>>> fsync regularly for achieving better throughput.
> >>>>
> >>>> Best wishes,
> >>>>
> >>>> Xiao Li
> >>>>
> >>>>
> >>>> On Mar 3, 2015, at 3:45 PM, Xiao <lixiao1...@gmail.com> wrote:
> >>>>
> >>>>> Hey Josh,
> >>>>>
> >>>>> Transactions can be applied in parallel in the consumer side based on
> >>>> transaction dependency checking.
> >>>>>
> >>>>> http://www.google.com.ar/patents/US20080163222
> >>>>>
> >>>>> This patent documents how it work. It is easy to understand, however,
> >>>> you also need to consider the hash collision issues. This has been
> >>>> implemented in IBM Q Replication since 2001.
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Xiao Li
> >>>>>
> >>>>>
> >>>>> On Mar 3, 2015, at 3:36 PM, Jay Kreps <jay.kr...@gmail.com> wrote:
> >>>>>
> >>>>>> Hey Josh,
> >>>>>>
> >>>>>> As you say, ordering is per partition. Technically it is generally
> >>>> possible
> >>>>>> to publish all changes to a database to a single
> partition--generally
> >>>> the
> >>>>>> kafka partition should be high throughput enough to keep up. However
> >>>> there
> >>>>>> are a couple of downsides to this:
> >>>>>> 1. Consumer parallelism is limited to one. If you want a total order
> >> to
> >>>> the
> >>>>>> consumption of messages you need to have just 1 process, but often
> you
> >>>>>> would want to parallelize.
> >>>>>> 2. Often what people want is not a full stream of all changes in all
> >>>> tables
> >>>>>> in a database but rather the changes to a particular table.
> >>>>>>
> >>>>>> To some extent the best way to do this depends on what you will do
> >> with
> >>>> the
> >>>>>> data. However if you intend to have lots
> >>>>>>
> >>>>>> I have seen pretty much every variation on this in the wild, and
> here
> >> is
> >>>>>> what I would recommend:
> >>>>>> 1. Have a single publisher process that publishes events into Kafka
> >>>>>> 2. If possible use the database log to get these changes (e.g. mysql
> >>>>>> binlog, Oracle xstreams, golden gate, etc). This will be more
> complete
> >>>> and
> >>>>>> more efficient than polling for changes, though that can work too.
> >>>>>> 3. Publish each table to its own topic.
> >>>>>> 4. Partition each topic by the primary key of the table.
> >>>>>> 5. Include in each message the database's transaction id, scn, or
> >> other
> >>>>>> identifier that gives the total order within the record stream.
> Since
> >>>> there
> >>>>>> is a single publisher this id will be monotonic within each
> partition.
> >>>>>>
> >>>>>> This seems to be the best set of tradeoffs for most use cases:
> >>>>>> - You can have parallel consumers up to the number of partitions you
> >>>> chose
> >>>>>> that still get messages in order per ID'd entity.
> >>>>>> - You can subscribe to just one table if you like, or to multiple
> >>>> tables.
> >>>>>> - Consumers who need a total order over all updates can do a "merge"
> >>>> across
> >>>>>> the partitions to reassemble the fully ordered set of changes across
> >> all
> >>>>>> tables/partitions.
> >>>>>>
> >>>>>> One thing to note is that the requirement of having a single
> consumer
> >>>>>> process/thread to get the total order isn't really so much a Kafka
> >>>>>> restriction as it just is a restriction about the world, since if
> you
> >>>> had
> >>>>>> multiple threads even if you delivered messages to them in order
> their
> >>>>>> processing might happen out of order (just do to the random timing
> of
> >>>> the
> >>>>>> processing).
> >>>>>>
> >>>>>> -Jay
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Mar 3, 2015 at 3:15 PM, Josh Rader <jrader...@gmail.com>
> >> wrote:
> >>>>>>
> >>>>>>> Hi Kafka Experts,
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> We have a use case around RDBMS replication where we are
> >> investigating
> >>>>>>> Kafka.  In this case ordering is very important.  Our understanding
> >> is
> >>>>>>> ordering is only preserved within a single partition.  This makes
> >>>> sense as
> >>>>>>> a single thread will consume these messages, but our question is
> can
> >> we
> >>>>>>> somehow parallelize this for better performance?   Is there maybe
> >> some
> >>>>>>> partition key strategy trick to have your cake and eat it too in
> >> terms
> >>>> of
> >>>>>>> keeping ordering, but also able to parallelize the processing?
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> I am sorry if this has already been asked, but we tried to search
> >>>> through
> >>>>>>> the archives and couldn’t find this response.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>>
> >>>>>>> Josh
> >>>>>>>
> >>>>>
> >>>>
> >>>>
> >>
> >>
>
>

Re: Database Replication Question

Reply via email to