Hi Andrey,
Below is the description of MEMORY_ONLY from
https://spark.apache.org/docs/latest/rdd-programming-guide.html
"Store RDD as deserialized Java objects in the JVM. If the RDD does not fit
in memory, some partitions will not be cached and will be recomputed on the
fly each time they're nee
||-- name: string (nullable = true)
> ||-- age: integer (nullable = false)
>
>
> On Wed, Feb 27, 2019 at 6:51 PM Hien Luu wrote:
>
>> Thanks for looking into this. Does this mean string fields should alway
>> be nullable?
>>
>> You are right that t
ot;null"]},{"name":"age","type":"int"}]}
>
> scala> dfKV.select(from_avro('value, avroTypeStruct)).show
> +-+
> |from_avro(value, struct)|
> +---------+
> | [Mary
Hi,
I ran into a pretty weird issue with to_avro and from_avro where it was not
able to parse the data in a struct correctly. Please see the simple and
self contained example below. I am using Spark 2.4. I am not sure if I
missed something.
This is how I start the spark-shell on my Mac:
./bin/
Hi Farshid,
Take a look at this example on github -
https://github.com/hienluu/structured-streaming-sources.
Cheers,
Hien
On Thu, Jul 12, 2018 at 12:52 AM Farshid Zavareh
wrote:
> Hello.
>
> I need to create a custom streaming source by extending *FileStreamSource*.
> The idea is to override
Finally got a toy version of Structured Streaming DataSource V2 version with
Apache Spark 2.3 working. Tested locally and on Databricks community
edition.
Source code is here - https://github.com/hienluu/wikiedit-streaming
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
I finally got around to implement a custom structured streaming receiver
(source) to read Wikipedia edit events from the IRC server.
It works fines locally as well as in spark-shell on my laptop. However, it
failed with the following exception when running in Databricks community
edition.
It see
Hi Kant,
I am not sure whether you had come up with a solution yet, but the following
works for me (in Scala)
val emp_info = """
[
{"name": "foo", "address": {"state": "CA", "country": "USA"},
"docs":[{"subject": "english", "year": 2016}]},
{"name": "bar", "address": {"state": "OH", "c
Cool. Thanks nezhazheng. I will give it a shot.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi TD,
I looked at DataStreamReader class and looks like we can specify an FQCN as
a source (provided that it implements trait Source). The
DataSource.lookupDataSource function will try to load this FQCN during the
creation of a DataSource object instance inside the DataStreamReader.load().
Will
r local repo?
>
> Best, RS
> On May 2, 2016 8:51 PM, "Hien Luu" wrote:
>
>> Hi all,
>>
>> I am running into a build problem with com.oracle:ojdbc6:jar:11.2.0.1.0.
>> It kept getting "Operation timed out" while building Spark Project Docker
>>
Hi all,
I am running into a build problem with com.oracle:ojdbc6:jar:11.2.0.1.0.
It kept getting "Operation timed out" while building Spark Project Docker
Integration Tests module (see the error below).
Has anyone run this problem before? If so, how did you resolve around this
problem?
[INFO] Re
eam.scala, there is everything you need to know.
>
> On Fri, Nov 6, 2015 at 6:25 PM Hien Luu wrote:
>
>> Hi,
>>
>> I am interested in learning about the implementation of
>> updateStateByKey. Does anyone know of a jira or design doc I read?
>>
>> I
Hi,
I am interested in learning about the implementation of updateStateByKey.
Does anyone know of a jira or design doc I read?
I did a quick search and couldn't find much info. on the implementation.
Thanks in advance,
Hien
Ted Yu wrote:
>>
>>> In my opinion, choosing some particular project among its peers should
>>> leave enough room for future growth (which may come faster than you
>>> initially think).
>>>
>>> Cheers
>>>
>>> On Fri, Aug 7, 2015 a
may come faster than you
>> initially think).
>>
>> Cheers
>>
>> On Fri, Aug 7, 2015 at 11:23 AM, Hien Luu wrote:
>>
>>> Scalability is a known issue due the the current architecture. However
>>> this will be applicable if you run more 20K jobs per day.
&
ling
>> and DAG type workflows (independent of spark-defined job flows).
>>
>>
>> On Fri, Aug 7, 2015 at 7:00 PM, Jörn Franke wrote:
>>
>>> Check also falcon in combination with oozie
>>>
>>> Le ven. 7 août 2015 à 17:51, Hien Luu a
>>
This blog outlines a few things that make Spark faster than MapReduce -
https://databricks.com/blog/2014/10/10/spark-petabyte-sort.html
On Fri, Aug 7, 2015 at 9:13 AM, Muler wrote:
> Consider the classic word count application over a 4 node cluster with a
> sizable working data. What makes Spark
Looks like Oozie can satisfy most of your requirements.
On Fri, Aug 7, 2015 at 8:43 AM, Vikram Kone wrote:
> Hi,
> I'm looking for open source workflow tools/engines that allow us to
> schedule spark jobs on a datastax cassandra cluster. Since there are tonnes
> of alternatives out there like
19 matches
Mail list logo