[ 
https://issues.apache.org/jira/browse/BEAM-6883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16892900#comment-16892900
 ] 

Ryan Skraba commented on BEAM-6883:
-----------------------------------

More bad news -- it looks like the StreamingSourceMetricsTests is not testing 
an UnboundedSource :(

By adding a breakpoint, you can see that it's producing a very-long-lived 
PCollection, but it is IsBounded.BOUNDED -- if we were let it run, it's a batch 
pipeline in the end.  Watermarks are never advanced or calculated, and there's 
nothing to indicate that the test should stop until the TestPipelineOptions 
timeout occurs.

A couple of options:

1) Use GenerateSequence in a truly unbounded mode and test that the PCollection 
is UNBOUNDED by removing the `to()` and `withMaxReadTime()` configuration.  Add 
a time function to make sure that the watermark advances to the end after 1000 
elements.  Due to the nature of GenerateSequence, you might see more than 1000 
elements (we removed the `to()`) but probably not exactly 1000 -- the assertion 
could be fixed, I suppose.

2) Use CreateStream or implement TestStream in SparkRunner, and have them 
generate Read metrics.

Both of those would finish in under 10 seconds and test the intended 
functionality -- as it is, this test isn't doing anything useful.

> StreamingSourceMetricsTest takes too long to finish
> ---------------------------------------------------
>
>                 Key: BEAM-6883
>                 URL: https://issues.apache.org/jira/browse/BEAM-6883
>             Project: Beam
>          Issue Type: Test
>          Components: runner-spark
>    Affects Versions: 2.11.0
>            Reporter: Ismaël Mejía
>            Assignee: Alexey Romanenko
>            Priority: Minor
>
> This test is part of Spark's ValidatesRunner suite and it takes more than 10 
> minutes to end.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to