[jira] [Resolved] (BEAM-3247) Sample.any memory constraint

2018-09-24 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li resolved BEAM-3247. -- Resolution: Fixed Fix Version/s: 2.2.0 > Sample.any memory constraint >

[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596565#comment-16596565 ] Neville Li commented on BEAM-5036: -- Yeah that's why I figured. So there's no way to reduce this overhead

[jira] [Comment Edited] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596565#comment-16596565 ] Neville Li edited comment on BEAM-5036 at 8/29/18 4:23 PM: --- Yeah that's what I

[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596513#comment-16596513 ] Neville Li commented on BEAM-5036: -- {{copy+delete}} is still expensive on GCS, especially when running

[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596410#comment-16596410 ] Neville Li commented on BEAM-5036: -- Yeah that's my main concern. We use GCS almost exclusively so all our

[jira] [Commented] (BEAM-5036) Optimize FileBasedSink's WriteOperation.moveToOutput()

2018-08-29 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-5036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596392#comment-16596392 ] Neville Li commented on BEAM-5036: -- If I understand this correctly, this issue affects all file based

[jira] [Commented] (BEAM-3234) PubsubIO batch size should be configurable

2017-12-01 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16275383#comment-16275383 ] Neville Li commented on BEAM-3234: -- Affects 2.2.0 as well.

[jira] [Updated] (BEAM-3234) PubsubIO batch size should be configurable

2017-12-01 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-3234: - Affects Version/s: 2.2.0 > PubsubIO batch size should be configurable >

[jira] [Updated] (BEAM-991) DatastoreIO Write should flush early for large batches

2017-11-28 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-991: Fix Version/s: 2.1.0 > DatastoreIO Write should flush early for large batches >

[jira] [Created] (BEAM-3247) Sample.any memory constraint

2017-11-24 Thread Neville Li (JIRA)
Neville Li created BEAM-3247: Summary: Sample.any memory constraint Key: BEAM-3247 URL: https://issues.apache.org/jira/browse/BEAM-3247 Project: Beam Issue Type: Improvement

[jira] [Assigned] (BEAM-3247) Sample.any memory constraint

2017-11-24 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li reassigned BEAM-3247: Assignee: Neville Li (was: Kenneth Knowles) > Sample.any memory constraint >

[jira] [Updated] (BEAM-3234) PubsubIO batch size should be configurable

2017-11-21 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-3234: - Description: Looks like there's a payload size limit in Pubsub, and PubsubIO has a hard coded batch size

[jira] [Updated] (BEAM-3234) PubsubIO batch size should be configurable

2017-11-21 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-3234: - Description: Looks like there's a payload size limit in Pubsub, and PubsubIO has a hard coded batch size

[jira] [Updated] (BEAM-3234) PubsubIO batch size should be configurable

2017-11-21 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-3234: - Description: Looks like there's a payload size limit in Pubsub, and PubsubIO has a hard coded batch size

[jira] [Created] (BEAM-3234) PubsubIO batch size should be configurable

2017-11-21 Thread Neville Li (JIRA)
Neville Li created BEAM-3234: Summary: PubsubIO batch size should be configurable Key: BEAM-3234 URL: https://issues.apache.org/jira/browse/BEAM-3234 Project: Beam Issue Type: Bug

[jira] [Assigned] (BEAM-2960) Missing type parameter in some AvroIO.Write API

2017-09-15 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li reassigned BEAM-2960: Assignee: Neville Li (was: Kenneth Knowles) > Missing type parameter in some AvroIO.Write API >

[jira] [Created] (BEAM-2766) HadoopInputFormatIO should support Void/null key/values

2017-08-14 Thread Neville Li (JIRA)
Neville Li created BEAM-2766: Summary: HadoopInputFormatIO should support Void/null key/values Key: BEAM-2766 URL: https://issues.apache.org/jira/browse/BEAM-2766 Project: Beam Issue Type: Bug

[jira] [Created] (BEAM-2765) HadoopInputFormatIO should support custom key/value coder

2017-08-14 Thread Neville Li (JIRA)
Neville Li created BEAM-2765: Summary: HadoopInputFormatIO should support custom key/value coder Key: BEAM-2765 URL: https://issues.apache.org/jira/browse/BEAM-2765 Project: Beam Issue Type:

[jira] [Commented] (BEAM-2658) SerializableCoder has higher precedence over ProtoCoder in CoderRegistry#getCoder

2017-07-23 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097699#comment-16097699 ] Neville Li commented on BEAM-2658: -- However I'd still argue that {{DefaultCoder}} and

[jira] [Commented] (BEAM-2658) SerializableCoder has higher precedence over ProtoCoder in CoderRegistry#getCoder

2017-07-23 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16097693#comment-16097693 ] Neville Li commented on BEAM-2658: -- Types covered by each {{CoderProvider}} may overlap and we might want

[jira] [Updated] (BEAM-2658) SerializableCoder has higher precedence over ProtoCoder in CoderRegistry#getCoder

2017-07-22 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-2658: - Description: {code} import com.google.protobuf.Timestamp; import org.apache.beam.sdk.Pipeline; import

[jira] [Updated] (BEAM-2658) SerializableCoder has higher precedence over ProtoCoder in CoderRegistry#getCoder

2017-07-22 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-2658: - Description: {code) public class CoderTest { public static void main(String[] args) throws

[jira] [Updated] (BEAM-2658) SerializableCoder has higher precedence over ProtoCoder in CoderRegistry#getCoder

2017-07-22 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-2658: - Description: {code} import com.google.protobuf.Timestamp; import org.apache.beam.sdk.Pipeline; import

[jira] [Updated] (BEAM-2658) SerializableCoder has higher precedence over ProtoCoder in CoderRegistry#getCoder

2017-07-22 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-2658: - Description: {code} public class CoderTest { public static void main(String[] args) throws

[jira] [Created] (BEAM-2658) SerializableCoder has high precedence over ProtoCoder in CoderRegistry#getCoder

2017-07-22 Thread Neville Li (JIRA)
Neville Li created BEAM-2658: Summary: SerializableCoder has high precedence over ProtoCoder in CoderRegistry#getCoder Key: BEAM-2658 URL: https://issues.apache.org/jira/browse/BEAM-2658 Project: Beam

[jira] [Updated] (BEAM-2658) SerializableCoder has higher precedence over ProtoCoder in CoderRegistry#getCoder

2017-07-22 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-2658: - Summary: SerializableCoder has higher precedence over ProtoCoder in CoderRegistry#getCoder (was:

[jira] [Commented] (BEAM-2453) The Java DirectRunner should exercise all parts of a CombineFn

2017-07-19 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16093359#comment-16093359 ] Neville Li commented on BEAM-2453: -- Here's an example of incorrect use of {{Combine.perKey}} that could be

[jira] [Commented] (BEAM-2532) BigQueryIO source should avoid expensive JSON schema parsing for every record

2017-07-17 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090599#comment-16090599 ] Neville Li commented on BEAM-2532: -- Would love to see a fix in the next release. This is a big performance

[jira] [Commented] (BEAM-302) Add Scio Scala DSL to Beam

2017-05-04 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15996831#comment-15996831 ] Neville Li commented on BEAM-302: - Yes that ecosystem has too many build params, scala version, spark

[jira] [Commented] (BEAM-302) Add Scio Scala DSL to Beam

2017-05-03 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15995075#comment-15995075 ] Neville Li commented on BEAM-302: - Looks like Spark runner still depends on 1.6.3. Can you give Spark 1.6 a

[jira] [Commented] (BEAM-302) Add Scio Scala DSL to Beam

2017-05-02 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994005#comment-15994005 ] Neville Li commented on BEAM-302: - You need the spark runner dependency which is not included by default. >

[jira] [Commented] (BEAM-302) Add Scio Scala DSL to Beam

2017-04-03 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15953584#comment-15953584 ] Neville Li commented on BEAM-302: - We prefer to keep it separate for now mainly for logistics reasons: - we

[jira] [Assigned] (BEAM-302) Add Scio Scala DSL to Beam

2017-04-01 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li reassigned BEAM-302: --- Assignee: (was: Neville Li) > Add Scio Scala DSL to Beam > -- > >

[jira] [Closed] (BEAM-1518) Support deflate (zlib) in CompressedSource and FileBasedSink

2017-03-20 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li closed BEAM-1518. Resolution: Fixed > Support deflate (zlib) in CompressedSource and FileBasedSink >

[jira] [Updated] (BEAM-1518) Support deflate (zlib) in CompressedSource and FileBasedSink

2017-03-20 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-1518: - Fix Version/s: 0.6.0 > Support deflate (zlib) in CompressedSource and FileBasedSink >

[jira] [Created] (BEAM-1520) Implement TFRecordIO (Reading/writing Tensorflow Standard format)

2017-02-21 Thread Neville Li (JIRA)
Neville Li created BEAM-1520: Summary: Implement TFRecordIO (Reading/writing Tensorflow Standard format) Key: BEAM-1520 URL: https://issues.apache.org/jira/browse/BEAM-1520 Project: Beam Issue

[jira] [Assigned] (BEAM-1520) Implement TFRecordIO (Reading/writing Tensorflow Standard format)

2017-02-21 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li reassigned BEAM-1520: Assignee: Neville Li (was: Davor Bonaci) > Implement TFRecordIO (Reading/writing Tensorflow

[jira] [Assigned] (BEAM-1519) Support snappy in CompressedSource and FileBasedSink

2017-02-21 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li reassigned BEAM-1519: Assignee: (was: Neville Li) > Support snappy in CompressedSource and FileBasedSink >

[jira] [Updated] (BEAM-1518) Support deflate (zlib) in CompressedSource and FileBasedSink

2017-02-21 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-1518: - Summary: Support deflate (zlib) in CompressedSource and FileBasedSink (was: Support ZLIB (deflate) in

[jira] [Updated] (BEAM-1518) Support ZLIB (deflate) in CompressedSource and FileBasedSink

2017-02-21 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-1518: - Description: `.deflate` files are quite common in Hadoop and also supported by TensorFlow in TFRecord file

[jira] [Created] (BEAM-1519) CLONE - Support snappy in CompressedSource and FileBasedSink

2017-02-21 Thread Neville Li (JIRA)
Neville Li created BEAM-1519: Summary: CLONE - Support snappy in CompressedSource and FileBasedSink Key: BEAM-1519 URL: https://issues.apache.org/jira/browse/BEAM-1519 Project: Beam Issue Type:

[jira] [Updated] (BEAM-1519) Support snappy in CompressedSource and FileBasedSink

2017-02-21 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neville Li updated BEAM-1519: - Summary: Support snappy in CompressedSource and FileBasedSink (was: CLONE - Support snappy in

[jira] [Created] (BEAM-1518) Support ZLIB (deflate) in CompressedSource and FileBasedSink

2017-02-21 Thread Neville Li (JIRA)
Neville Li created BEAM-1518: Summary: Support ZLIB (deflate) in CompressedSource and FileBasedSink Key: BEAM-1518 URL: https://issues.apache.org/jira/browse/BEAM-1518 Project: Beam Issue Type:

[jira] [Comment Edited] (BEAM-298) Make TestPipeline implement the TestRule interface

2017-02-10 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862201#comment-15862201 ] Neville Li edited comment on BEAM-298 at 2/11/17 4:12 AM: -- That didn't work for me.

[jira] [Commented] (BEAM-298) Make TestPipeline implement the TestRule interface

2017-02-10 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862201#comment-15862201 ] Neville Li commented on BEAM-298: - That didn't work for me. I had to add it as a {compile} scope. > Make

[jira] [Comment Edited] (BEAM-298) Make TestPipeline implement the TestRule interface

2017-02-10 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861795#comment-15861795 ] Neville Li edited comment on BEAM-298 at 2/10/17 8:08 PM: -- As a result of this

[jira] [Commented] (BEAM-298) Make TestPipeline implement the TestRule interface

2017-02-10 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861795#comment-15861795 ] Neville Li commented on BEAM-298: - As a result of this change I need to include {{junit}} in my dependencies

[jira] [Commented] (BEAM-302) Add Scio Scala DSL to Beam

2017-01-24 Thread Neville Li (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15836553#comment-15836553 ] Neville Li commented on BEAM-302: - WIP branch here using 0.4.0