KafkaUnboundedReader

2020-07-29 Thread wang Wu
Hi, I am curious about this comment: if (offset < expected) { // -- (a) // this can happen when compression is enabled in Kafka (seems to be fixed in 0.10) // should we check if the offset is way off from consumedOffset (say > 1M)? LOG.warn( "{}: ignor

Re: Partition unbounded collection like Kafka source

2020-07-28 Thread wang Wu
s/java/core/src/main/java/org/apache/beam/sdk/transforms/Partition.java > > On Thu, Jul 23, 2020 at 5:01 AM wang Wu wrote: > >> Hi, >> Supposed that a Kafka topic has 3 partitions only. Now we want to >> partition it into 20 partition, each one will produce an output coll

Re: Custom Cassandra IO

2020-07-23 Thread wang Wu
/io/cassandra/CassandraIO.java#L1139> > >> On 20 Jul 2020, at 12:49, wang Wu > <mailto:faran...@gmail.com>> wrote: >> >> But unfortunately that way will not work as Session and/or Cluster is not >> serialisable. >> >> Regards >> Dinh >

Partition unbounded collection like Kafka source

2020-07-23 Thread wang Wu
Hi, Supposed that a Kafka topic has 3 partitions only. Now we want to partition it into 20 partition, each one will produce an output collection. The purpose is to write to the sink in parallel from all 20 output collections. Will this code achieve that purpose? KafkaIO.Read reader = Kaf

Re: Custom Cassandra IO

2020-07-20 Thread wang Wu
But unfortunately that way will not work as Session and/or Cluster is not serialisable. Regards Dinh > On 20 Jul BE 2563, at 17:42, wang Wu wrote: > > Hi, > > We are thinking of tuning connection pooling like this: > https://docs.datastax.com/en/developer/java-driver/

Re: Custom Cassandra IO

2020-07-20 Thread wang Wu
tom Cluster > instance with all required configuration. > > In both cases, it will require CassandraIO modification. > > >> On 18 Jul 2020, at 13:11, wang Wu wrote: >> >> I notice that the standard Cassandra IO setup Cluster with basics settings. >> Is

Custom Cassandra IO

2020-07-18 Thread wang Wu
I notice that the standard Cassandra IO setup Cluster with basics settings. Is it possible to implement custom Cassandra IO in which I can customise Datastax driver? Any sample code will be helpful. Thanks

Re: Concurrency issue with KafkaIO

2020-07-03 Thread wang Wu
On 30 Jun BE 2563, at 23:53, wang Wu wrote: > > We encountered similar exception with KafkaUnboundedReader. By similarity I > mean it start from > org.apache.spark.rdd.RDD.computeOrReadCheckpoint > > And it ends at org.apache.beam.sdk.io.kafka.KafkaUnboundedReader.advance &

Re: Concurrency issue with KafkaIO

2020-07-02 Thread wang Wu
different threads. So we need to check it there is no issue under > high load with that. > > Btw, which Kafka client version do you use? > >> On 30 Jun 2020, at 18:53, wang Wu > <mailto:faran...@gmail.com>> wrote: >> >> We encountered

Re: Concurrency issue with KafkaIO

2020-06-30 Thread wang Wu
6, Alexey Romanenko > wrote: > > I don’t think it’s a known issue. Could you tell with version of Beam you use? > >> On 28 Jun 2020, at 14:43, wang Wu wrote: >> >> Hi, >> We run Beam pipeline on Spark in the streaming mode. We subscribe to >> multiple Kaf

Re: Concurrency issue with KafkaIO

2020-06-29 Thread wang Wu
it’s a known issue. Could you tell with version of Beam you use? > >> On 28 Jun 2020, at 14:43, wang Wu wrote: >> >> Hi, >> We run Beam pipeline on Spark in the streaming mode. We subscribe to >> multiple Kafka topics. Our job run fine until it is under heavy lo

Concurrency issue with KafkaIO

2020-06-28 Thread wang Wu
Hi, We run Beam pipeline on Spark in the streaming mode. We subscribe to multiple Kafka topics. Our job run fine until it is under heavy load: millions of Kafka messages coming per seconds. The exception look like concurrency issue. Is it a known bug in Beam or some Spark configuration we could