from:"David Capwell"

Re: how to add columns to row when column has a different encoder?

2018-02-28 Thread David Capwell

interface to add such a UDF? Thanks for your help! On Mon, Feb 26, 2018, 3:50 PM David Capwell wrote: > I have a row that looks like the following pojo > > case class Wrapper(var id: String, var bytes: Array[Byte]) > > Those bytes are a serialized pojo that looks like this > >

Re: SizeEstimator

2018-02-27 Thread David Capwell

n a different stage). Make sure number of executors is small (for example only one) else you are reducing the size of M for each executor. On Mon, Feb 26, 2018, 10:04 PM 叶先进 wrote: > What type is for the buffer you mentioned? > > > On 27 Feb 2018, at 11:46 AM, David Capwell wrote:

Re: SizeEstimator

2018-02-26 Thread David Capwell

s to convert the protobuf object to byte > array, It does speed up SizeEstimator > > On Mon, Feb 26, 2018 at 5:34 PM, David Capwell wrote: > >> This is used to predict the current cost of memory so spark knows to >> flush or not. This is very costly for us so we use a flag m

Re: SizeEstimator

2018-02-26 Thread David Capwell

This is used to predict the current cost of memory so spark knows to flush or not. This is very costly for us so we use a flag marked in the code as private to lower the cost spark.shuffle.spill.numElementsForceSpillThreshold (on phone hope no typo) - how many records before flush This lowers the

how to add columns to row when column has a different encoder?

2018-02-26 Thread David Capwell

I have a row that looks like the following pojo case class Wrapper(var id: String, var bytes: Array[Byte]) Those bytes are a serialized pojo that looks like this case class Inner(var stuff: String, var moreStuff: String) I right now have encoders for both the types, but I don't see how to merge

Re: Encoder with empty bytes deserializes with non-empty bytes

2018-02-21 Thread David Capwell

:: Nil) This causes the java code to see a byte[] which uses a different code path than linked. Since I did ArrayType(ByteTyep) I had to wrap the data in a ArrayData class On Wed, Feb 21, 2018 at 9:55 PM, David Capwell wrote: > I am trying to create a Encoder for protobuf data and noticed so

Encoder with empty bytes deserializes with non-empty bytes

2018-02-21 Thread David Capwell

I am trying to create a Encoder for protobuf data and noticed something rather weird. When we have a empty ByteString (not null, just empty), when we deserialize we get back a empty array of length 8. I took the generated code and see something weird going on. UnsafeRowWriter 1. public

Dynamic Accumulators in 2.x?

2017-10-11 Thread David Capwell

I wrote a spark instrumentation tool that instruments RDDs to give more fine-grain details on what is going on within a Task. This is working right now, but uses volatiles and CAS to pass around this state (which slows down the task). We want to lower the overhead of this and make the main call p

add jars to spark's runtime

2017-10-11 Thread David Capwell

We want to emit the metrics out of spark into our own custom store. To do this we built our own sink and tried to add it to spark by doing --jars path/to/jar and defining the class in metrics.properties which is supplied with the job. We noticed that spark kept crashing with the below exception

Re: how to add columns to row when column has a different encoder?

Re: SizeEstimator

Re: SizeEstimator

Re: SizeEstimator

how to add columns to row when column has a different encoder?

Re: Encoder with empty bytes deserializes with non-empty bytes

Encoder with empty bytes deserializes with non-empty bytes

Dynamic Accumulators in 2.x?

add jars to spark's runtime

9 matches

Site Navigation

Mail list logo

Footer information