: What are the alternatives to nested DataFrames?
2 options I can think of:
1- Can you perform a union of dfs returned by elastic research queries. It
would still be distributed but I don't know if you will run out of how many
union operations you can perform at a time.
2- Can you used
m...@yeikel.com
> *Cc:* Shahab Yunus ; user
> *Subject:* Re: What are the alternatives to nested DataFrames?
>
>
>
> Could you join() the DFs on a common key?
>
>
>
> On Fri, Dec 28, 2018 at 18:35 wrote:
>
> Shabad , I am not sure what you are trying to say. Could
iginal DF and returns a new dataframe including all the
matching terms
From: Andrew Melo
Sent: Friday, December 28, 2018 8:48 PM
To: em...@yeikel.com
Cc: Shahab Yunus ; user
Subject: Re: What are the alternatives to nested DataFrames?
Could you join() the DFs on a common key?
tString(0)*
>
>
>
> * val qb = QueryBuilders.matchQuery("name",
> city).operator(Operator.AND)*
>
> * print(qb.toString)*
>
>
>
> * val dfs = sqlContext.esDF("cities/docs", qb.toString) // null
> pointer*
>
>
>
> * dfs.show()*
>
>
>
&g
uery("name", city).operator(Operator.AND)
print(qb.toString)
val dfs = sqlContext.esDF("cities/docs", qb.toString) // null pointer
dfs.show()
})
From: Shahab Yunus
Sent: Friday, December 28, 2018 12:34 PM
To: em...@yeikel.com
Cc: user
Sub
Can you have a dataframe with a column which stores json (type string)? Or
you can also have a column of array type in which you store all cities
matching your query.
On Fri, Dec 28, 2018 at 2:48 AM wrote:
> Hi community ,
>
>
>
> As shown in other answers online , Spark does not support the n
Hi community ,
As shown in other answers online , Spark does not support the nesting of
DataFrames , but what are the options?
I have the following scenario :
dataFrame1 = List of Cities
dataFrame2 = Created after searching in ElasticSearch for each city in
dataFrame1
I've tri
May be you could try something like this using sparkSQL 1.4 and dataframes
student.join(Grade, Grade("student_id") === student("student_id"), "left")
.groupBy("id")
.agg(sum(grade("Marks")), avg(grade("Marks")))
You could refer to the following document :
https://spark.apache.o
I am looking to do something similar to this Postgres query in HiveQL. If
I have a DataFrame student and a DataFrame grade, is this possible?
I read in Learning Spark: Lightning-Fast Big Data Analysis that it should
be possible. It says in Chapter 9
"SchemaRDDs can store several basic types, as