Hello,
I'm trying to compile google's timestamp.proto protobuf to a scala case
class and use it as a field in another proto-derived case class as part of a
larger dataset schema.
(Although the SQL date type might be preferred in a schema, I encountered
this problem when I attempted to use
Hi all,
I am having some troubles in doing a count distinct over multiple columns.
This is an example of my data:
++++---+
|a |b |c |d |
++++---+
|null|null|null|1 |
|null|null|null|2 |
|null|null|null|3 |
|null|null|null|4 |
|null|null|null|5 |
Just wanted to add a comment to the Jira ticket but I don't think I have
permission to do so, so answering here instead. I am encountering the same
issue with a stackOverflow Exception.
I would like to point out that there is a
localCheckpoint
You didn't say how you're zipping the dependencies, but I'm guessing you
either include .egg files or zipped up a virtualenv. In either case, the
extra C stuff that scipy and pandas rely upon doesn't get included.
An approach like this solved the last problem I had that seemed like this -
Hi,
Any Spark-connector for HDF5?
The following link does not work anymore?
https://www.hdfgroup.org/downloads/spark-connector/
down vo
Thanks,
Kathleen
On Thu, Sep 13, 2018 at 7:47 PM Pekka Lehtonen wrote:
>
>
Hi,
We're starting to use Spark2 with usecases for Dynamic Allocation.
However, it was noticed it doesn't work as expected when dataset is
cached (persist).
The cluster runs with:
CDH 5.15.0
Spark 2.3.0
Oracle Java 8.131
The following configs are passed to spark (as well as setup at cluster):
#
A question, if you use Spark Streaming, the DAG is calculated for each
microbatch? it's possible to calculate only the first time?
Hi All,
Is there any open source framework that converts Cypher to SparkSQL?
Thanks!
Hi Aakash,
in the cluster you need to consider the total number of executors you
are using. Please take a look in the following link
for an introduction.
https://spoddutur.github.io/spark-notes/distribution_of_executors_cores_and_memory_for_spark_application.html
regards,
Apostolos
Local only one JVM, runs on the host you submitted the job
${SPARK_HOME}/bin/spark-submit \
--master local[N] \
Standalone meaning using Spark own scheduler
${SPARK_HOME}/bin/spark-submit \
--master spark:// \
Where IP_ADDRESS is the host your Spark master
Hi,
What is the Spark cluster equivalent of standalone's local[N]. I mean, the
value we set as a parameter of local as N, which parameter takes it in the
cluster mode?
Thanks,
Aakash.
12 matches
Mail list logo