interface to add such a UDF?
Thanks for your help!
On Mon, Feb 26, 2018, 3:50 PM David Capwell wrote:
> I have a row that looks like the following pojo
>
> case class Wrapper(var id: String, var bytes: Array[Byte])
>
> Those bytes are a serialized pojo that looks like this
>
>
n a different stage). Make sure number of
executors is small (for example only one) else you are reducing the size of
M for each executor.
On Mon, Feb 26, 2018, 10:04 PM 叶先进 wrote:
> What type is for the buffer you mentioned?
>
>
> On 27 Feb 2018, at 11:46 AM, David Capwell wrote:
s to convert the protobuf object to byte
> array, It does speed up SizeEstimator
>
> On Mon, Feb 26, 2018 at 5:34 PM, David Capwell wrote:
>
>> This is used to predict the current cost of memory so spark knows to
>> flush or not. This is very costly for us so we use a flag m
This is used to predict the current cost of memory so spark knows to flush
or not. This is very costly for us so we use a flag marked in the code as
private to lower the cost
spark.shuffle.spill.numElementsForceSpillThreshold (on phone hope no typo)
- how many records before flush
This lowers the
I have a row that looks like the following pojo
case class Wrapper(var id: String, var bytes: Array[Byte])
Those bytes are a serialized pojo that looks like this
case class Inner(var stuff: String, var moreStuff: String)
I right now have encoders for both the types, but I don't see how to merge
:: Nil)
This causes the java code to see a byte[] which uses a different code path
than linked. Since I did ArrayType(ByteTyep) I had to wrap the data in a
ArrayData class
On Wed, Feb 21, 2018 at 9:55 PM, David Capwell wrote:
> I am trying to create a Encoder for protobuf data and noticed so
I am trying to create a Encoder for protobuf data and noticed something
rather weird. When we have a empty ByteString (not null, just empty), when
we deserialize we get back a empty array of length 8. I took the generated
code and see something weird going on.
UnsafeRowWriter
1.
public
I wrote a spark instrumentation tool that instruments RDDs to give more
fine-grain details on what is going on within a Task. This is working
right now, but uses volatiles and CAS to pass around this state (which
slows down the task). We want to lower the overhead of this and make the
main call p
We want to emit the metrics out of spark into our own custom store. To do
this we built our own sink and tried to add it to spark by doing --jars
path/to/jar and defining the class in metrics.properties which is supplied
with the job. We noticed that spark kept crashing with the below exception