Hello,
I noticed that some of the (Big Data / Cloud Managed) Hadoop
distributions are starting to (phase out / deprecate) Spark 1.x and I
was wondering if the Spark community has already decided when will it
end the support for Spark 1.x. I ask this also considering that the
latest release in the
All:
This may be off topic for Spark, but I'm sure several of you might have
used some form of this as part of your BigData implementations. So, wanted
to reach out.
As part of the Data Lake and Data Processing (by Spark as an example), we
might end up different form-factors for the files (via,
Hi Ismael,
It depends on what you mean by “support”. In general, there won’t be new
feature releases for 1.X (e.g. Spark 1.7) because all the new features are
being added to the master branch. However, there is always room for bug fix
releases if there is a catastrophic bug, and committers can
Hi All,
I have a requirement to pivot multiple columns using single columns, the
pivot API doesn't support doing that hence I have been doing pivot for two
columns and then trying to merge the dataset the result is producing empty
dataset. Below is the sudo code
Main dataset => 33 columns (30
Is there any limit on number of columns used in inner join ?
Thank you
Anil Langote
Sent from my iPhone
_
From: Anil Langote >
Sent: Thursday, October 19, 2017 5:01 PM
Subject: Spark Inner Join on pivoted
IE: If my JDBC table has an index on it, will the optimizer consider that
when pushing predicates down?
I noticed in a query like this:
df = spark.hiveContext.read.jdbc(
url=jdbc_url,
table="schema.table",
column="id",
lowerBound=lower_bound_id,
upperBound=upper_bound_id,
Ok, so when Spark is forming queries it's ignorant of the underlying
storage layer index.
If there is an index on a table Spark doesn't take that into account when
doing the predicate push down in optimization. In that case why does spark
push 2 of my conditions (where fieldx = 'action') to the
Simple way is to have a network volume mounted with same name to make
things easy
On Thu, 19 Oct 2017 at 8:24 PM Uğur Sopaoğlu wrote:
> Hello,
>
> I have a very easy problem. How I run a spark job, I must copy jar file to
> all worker nodes. Is there any way to do simple?.
Hello,
I have a very easy problem. How I run a spark job, I must copy jar file to
all worker nodes. Is there any way to do simple?.
--
Uğur Sopaoğlu
This is a good place to start from:
https://spark.apache.org/docs/latest/submitting-applications.html
Best,
On Thu, Oct 19, 2017 at 5:24 PM, Uğur Sopaoğlu wrote:
> Hello,
>
> I have a very easy problem. How I run a spark job, I must copy jar file to
> all worker nodes. Is
Use `bin/spark-submit --jars` option.
On Thu, Oct 19, 2017 at 11:54 PM, 郭鹏飞 wrote:
> You can use bin/spark-submit tool to submit you jar to the cluster.
>
> > 在 2017年10月19日,下午11:24,Uğur Sopaoğlu 写道:
> >
> > Hello,
> >
> > I have a very easy
You can use bin/spark-submit tool to submit you jar to the cluster.
> 在 2017年10月19日,下午11:24,Uğur Sopaoğlu 写道:
>
> Hello,
>
> I have a very easy problem. How I run a spark job, I must copy jar file to
> all worker nodes. Is there any way to do simple?.
>
> --
> Uğur
sorry what do you mean my JDBC table has an index on it? Where are you
reading the data from the table?
I assume you are referring to "id" column on the table that you are reading
through JDBC connection.
Then you are creating a temp Table called "df". That temp table is created
in temporary
If the underlying table(s) have indexes on them. Does spark use those
indexes to optimize the query?
IE if I had a table in my JDBC data source (mysql in this case) had several
indexes and my query was filtering on one of the fields with an index.
Would spark know to push that predicate to the
remember your indexes are in RDBMS. In this case MySQL. When you are
reading from that table you have an 'id' column which I assume is an
integer and you are making parallel threads through JDBC connection to that
table. You can see the threads in MySQL if you query it. You can see
multiple
15 matches
Mail list logo