Apologies, I must have not made myself clear.

I meant the values in the records coming from the input topic (which in turn are coming from kafka connect in the example at hand)

and not the records coming out of the join.

My intention was to warn against sending null values from kafka connect to the topic that is then meant to be read-in as a ktable to filter against.


Am I clearer now?


Cheers,

Michał


On 30/04/17 18:14, Matthias J. Sax wrote:
Your observation is correct.

If  you use inner KStream-KTable join, the join will implement the
filter automatically as the join will not return any result.


-Matthias



On 4/30/17 7:23 AM, Michal Borowiecki wrote:
I have something working on the same principle (except not using
connect), that is, I put ids to filter on into a ktable and then (inner)
join a kstream with that ktable.

I don't believe the value can be null though. In a changlog null value
is interpreted as a delete so won't be put into a ktable.

The RocksDB store, for one, does this:

private void putInternal(byte[] rawKey, byte[] rawValue) {
     if (rawValue == null) {
         try {
             db.delete(wOptions, rawKey);

But any non-null value would do.
Please correct me if miss-understood.

Cheers,
Michał

On 27/04/17 22:44, Matthias J. Sax wrote:
I'd like to avoid repeated trips to the db, and caching a large amount of
data in memory.
Lookups to the DB would be hard to get done anyway. Ie, it would not
perform well, as all your calls would need to be synchronous...


Is it possible to send a message w/ the id as the partition key to a topic,
and then use the same id as the key, so the same node which will receive
the data for an id is the one which will process it?
That is what I did propose (maybe it was not clear). If you use Connect,
you can just import the ID into Kafka and leave the value empty (ie,
null). This reduced you cache data to a minimum. And the KStream-KTable
join work as you described it :)


-Matthias

On 4/27/17 2:37 PM, Ali Akhtar wrote:
I'd like to avoid repeated trips to the db, and caching a large amount of
data in memory.

Is it possible to send a message w/ the id as the partition key to a topic,
and then use the same id as the key, so the same node which will receive
the data for an id is the one which will process it?


On Fri, Apr 28, 2017 at 2:32 AM, Matthias J. Sax <matth...@confluent.io>
wrote:

The recommended solution would be to use Kafka Connect to load you DB
data into a Kafka topic.

With Kafka Streams you read your db-topic as KTable and do a (inne)
KStream-KTable join to lookup the IDs.


-Matthias

On 4/27/17 2:22 PM, Ali Akhtar wrote:
I have a Kafka topic which will receive a large amount of data.

This data has an 'id' field. I need to look up the id in an external db,
see if we are tracking that id, and if yes, we process that message, if
not, we ignore it.

99% of the data will be for ids which are not being tracked - 1% or so
will
be for ids which are tracked.

My concern is, that there'd be a lot of round trips to the db made just
to
check the id, and if it'd be better to cache the ids being tracked
somewhere, so other ids are ignored.

I was considering sending a message to another (or the same topic)
whenever
a new id is added to the track list, and that id should then get
processed
on the node which will process the messages.

Should I just cache all ids on all nodes (which may be a large amount),
or
is there a way to only cache the id on the same kafka streams node which
will receive data for that id?

--
Signature
<http://www.openbet.com/>         Michal Borowiecki
Senior Software Engineer L4
        T:      +44 208 742 1600

        
        +44 203 249 8448

        
        
        E:      michal.borowie...@openbet.com
        W:      www.openbet.com <http://www.openbet.com/>

        
        OpenBet Ltd

        Chiswick Park Building 9

        566 Chiswick High Rd

        London

        W4 5XT

        UK

        
<https://www.openbet.com/email_promo>

This message is confidential and intended only for the addressee. If you
have received this message in error, please immediately notify the
postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it
from your system as well as any copies. The content of e-mails as well
as traffic data may be monitored by OpenBet for employment and security
purposes. To protect the environment please do not print this e-mail
unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building
9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company
registered in England and Wales. Registered no. 3134634. VAT no.
GB927523612


--
Signature
<http://www.openbet.com/>         Michal Borowiecki
Senior Software Engineer L4
        T:      +44 208 742 1600

        
        +44 203 249 8448

        
        
        E:      michal.borowie...@openbet.com
        W:      www.openbet.com <http://www.openbet.com/>

        
        OpenBet Ltd

        Chiswick Park Building 9

        566 Chiswick High Rd

        London

        W4 5XT

        UK

        
<https://www.openbet.com/email_promo>

This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postmas...@openbet.com <mailto:postmas...@openbet.com> and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by OpenBet for employment and security purposes. To protect the environment please do not print this e-mail unless necessary. OpenBet Ltd. Registered Office: Chiswick Park Building 9, 566 Chiswick High Road, London, W4 5XT, United Kingdom. A company registered in England and Wales. Registered no. 3134634. VAT no. GB927523612

Reply via email to