On Sun, Dec 18, 2016 at 2:26 AM, vaquar khan wrote:
> select * from indexInfo;
>
Hi Vaquar
I do not see CF with the name indexInfo in any of the cassandra databases.
Thank
Deepak
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
I am not pyspark person ..
But from the errors I could figure out that your Spark application is
having memory issues .
Are you collecting the results to the driver at any point of time or have
configured less memory for the nodes ?
and If you are using Dataframes then there is issue raised in Ji
Anyone? This is for a book, so I need to figure this out.
On Fri, Dec 16, 2016 at 12:53 AM Russell Jurney
wrote:
> I have created a PySpark Streaming application that uses Spark ML to
> classify flight delays into three categories: on-time, slightly late, very
> late. After an hour or so somethi
blockquote, div.yahoo_quoted { margin-left: 0 !important; border-left:1px
#715FFA solid !important; padding-left:1ex !important; background-color:white
!important; } Super, that works! Thanks
Sent from Yahoo Mail for iPhone
On Sunday, December 18, 2016, 11:28 AM, Yong Zhang wrote:
-- P {m
Why not you just return the struct you defined, instead of an array?
@Override
public Row call(Double x, Double y) throws Exception {
Row row = RowFactory.create(x, y);
return row;
}
From: Richa
Hi,
Goal of my benchmark is to arrive at end to end latency lower than 100ms
and sustain them over time, by consuming from a kafka topic and writing
back to another kafka topic using Spark. Since the job does not do
aggregation and does a constant time processing on each message, it
appeared to me
There are 8 worker nodes in the cluster .
Thanks
Deepak
On Dec 18, 2016 2:15 AM, "Holden Karau" wrote:
> How many workers are in the cluster?
>
> On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma
> wrote:
>
>> Hi All,
>> I am iterating over data frame's paritions using df.foreachPartition .
>> Up
I tried to transform
root
|-- latitude: double (nullable = false)
|-- longitude: double (nullable = false)
|-- name: string (nullable = true)
to:
root
|-- name: string (nullable = true)
|-- location: struct (nullable = true)
| |-- longitude: double (nullable = true)
| |-- latitude: d
spark only needs to be present on the machine that launches it using
spark-submit
On Sat, Dec 17, 2016 at 3:59 PM, Jorge Machado wrote:
> Hi Tiago,
>
> thx for the update. Lat question : but this spark-submit that you are
> using need to be on the same version on all yarn hosts ?
> Regards
>
> J
"[D" type means a double array type. So this error simple means you have
double[] data, but Spark needs to cast it to Double, as your schema defined.
The error message clearly indicates the data doesn't match with the type
specified in the schema.
I wonder how you are so sure about your data
On Fri, Dec 16, 2016 at 7:01 PM, Chintan Bhatt <
chintanbhatt...@charusat.ac.in> wrote:
Hi
> I want to give continuous output (avg. temperature) generated from node.js
> to store on Hadoop and then retrieve it for visualization.
> please guide me how to give continuous output of node.js to kaf
Hi Tiago,
thx for the update. Lat question : but this spark-submit that you are using
need to be on the same version on all yarn hosts ?
Regards
Jorge Machado
> On 17 Dec 2016, at 16:46, Tiago Albineli Motta wrote:
>
> Hi Jorge,
>
> Here we are using an apache hadoop instalation, and
Hi Deepak,
Could you share Index information in your database.
select * from indexInfo;
Regards,
Vaquar khan
On Sat, Dec 17, 2016 at 2:45 PM, Holden Karau wrote:
> How many workers are in the cluster?
>
> On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma
> wrote:
>
>> Hi All,
>> I am iterating
How many workers are in the cluster?
On Sat, Dec 17, 2016 at 12:23 PM Deepak Sharma
wrote:
> Hi All,
> I am iterating over data frame's paritions using df.foreachPartition .
> Upon each iteration of row , i am initializing DAO to insert the row into
> cassandra.
> Each of these iteration takes a
Hi All,
I am iterating over data frame's paritions using df.foreachPartition .
Upon each iteration of row , i am initializing DAO to insert the row into
cassandra.
Each of these iteration takes almost 1 and half minute to finish.
In my workflow , this is part of an action and 100 partitions are bei
data is good
On Saturday, December 17, 2016 11:50 PM, "zjp_j...@163.com"
wrote:
#yiv7434848277 body {line-height:1.5;}#yiv7434848277 blockquote
{margin-top:0px;margin-bottom:0px;margin-left:0.5em;}#yiv7434848277
div.yiv7434848277foxdiv20161217234614718397 {}#yiv7434848277 body
{font
I think the causation is your invanlid Double data , have u checked your data ?
zjp_j...@163.com
From: Richard Xin
Date: 2016-12-17 23:28
To: User
Subject: Java to show struct field from a Dataframe
let's say I have a DataFrame with schema of followings:
root
|-- name: string (nullable = true
Hi Jorge,
Here we are using an apache hadoop instalation, and to run multiple
versions we just need to change the submit in the client using the correct
spark version you need.
$SPARK_HOME/bin/spark-submit
and pass the correct Spark libs in the conf.
For spark 2.0.0
--conf spark.yarn.archive=
let's say I have a DataFrame with schema of followings:root
|-- name: string (nullable = true)
|-- location: struct (nullable = true)
| |-- longitude: double (nullable = true)
| |-- latitude: double (nullable = true)
df.show(); throws following exception:
java.lang.ClassCastException: [D
The spark hive udf can read broadcast the variables?
Given a set of transformations does spark create multiple DAG's and picks
the DAG by some metric such as say higher degree of concurrency or
something else like the typical task graph model in parallel computing
suggests? or does it simply builds one simple DAG by going through
transformations/task
I am unable to retrieve the state and Id of a submitted application on a
Standalone cluster. The job gets executed successfully on the cluster.
The state was checked using:
while(!handle.getState().isFinal()){
//print handle.getState()
}
When run as local, state gets reported correctly.
Rega
I actually already made a pull request adding support for arbitrary
sequence types.
https://github.com/apache/spark/pull/16240
There is still a little problem of Seq.toDS not working for those types
(couldn't get implicits with multiple type parameters to resolve
correctly) but createDataset
I tried like this,
*CrashData_1.csv:*
*CRASH_KEYCRASH_NUMBER CRASH_DATECRASH_MONTH*
*2016899114 2016899114 01/02/2016 12:00:00
AM +*
*CrashData_2.csv:*
*CITY_NAMEZIPCODE CITY STATE*
*1945 704
thanks for pointing to the right direction, I have figured out the way.
On Saturday, December 17, 2016 5:23 PM, Igor Berman
wrote:
do you mind to show what you have in java?in general $"bla" is col("bla") as
soon as you import appropriate functionimport static
org.apache.spark.sql.fu
do you mind to show what you have in java?
in general $"bla" is col("bla") as soon as you import appropriate function
import static org.apache.spark.sql.functions.callUDF;
import static org.apache.spark.sql.functions.col;
udf should be callUDF e.g.
ds.withColumn("localMonth", callUDF("toLocalMonth"
26 matches
Mail list logo