Thanks Judy, this is exactly what I'm looking for. However, and plz forgive
me if it's a dump question is: It seems to me that thrift is the same as
hive2 JDBC driver, does this mean that starting thrift will start hive as
well on the server?
On Mon, Dec 8, 2014 at 9:11 PM, Judy Nash
Can this assembly get faster if we do not need the Spark SQL or some other
component in spark ? such as we only need the core of spark.
On Wed, Nov 26, 2014 at 3:37 PM, lihu lihu...@gmail.com wrote:
Matei, sorry for my last typo error. And the tip can improve about 30s in
my computer.
On
Essentially, the Spark SQL JDBC Thrift server is just a Spark port of
HiveServer2. You don't need to run Hive, but you do need a working
Metastore.
On 12/9/14 3:59 PM, Anas Mosaad wrote:
Thanks Judy, this is exactly what I'm looking for. However, and plz
forgive me if it's a dump question is:
Ah... I think you're right about the flatMap then :). Or you could use
mapPartitions. (I'm not sure if it makes a difference.)
On Mon, Dec 8, 2014 at 10:09 PM, Steve Lewis lordjoe2...@gmail.com wrote:
looks good but how do I say that in Java
as far as I can see sc.parallelize (in Java) has
Hello Experts,
I'm working on a spark app which reads data from kafka persists it in
hbase.
Spark documentation states the below *[1]* that in case of worker failure
we can loose some data. If not how can I make my kafka stream more reliable?
I have seen there is a simple consumer *[2]* but I'm
Thanks Cheng,
I thought spark-sql is using the same exact metastore, right? However, it
didn't work as expected. Here's what I did.
In spark-shell, I loaded a csv files and registered the table, say
countries.
Started the thrift server.
Connected using beeline. When I run show tables or !tables,
How did you register the table under spark-shell? Two things to notice:
1. To interact with Hive, HiveContext instead of SQLContext must be used.
2. `registerTempTable` doesn't persist the table into Hive metastore,
and the table is lost after quitting spark-shell. Instead, you must use
Dear All,
I am using spark streaming, It was working fine when i was using spark1.0.2,
now i repeatedly getting few issue
Like class not found, i am using the same pom.xml with the updated version
for all spark modules
i am using spark-core,streaming, streaming with kafka modules..
Its
I checked my code again, and located the issue that, if we do the `load data
inpath` before select statement, the application will get stuck, if don't, it
won't get stuck.Log info: 14/12/09 17:29:33 ERROR actor.ActorSystemImpl:
Uncaught fatal error from thread
I checked my code again, and located the issue that, if we do the `load data
inpath` before select statement, the application will get stuck, if don't, it
won't get stuck.Get stuck code: val sqlLoadData = sLOAD DATA INPATH
'$currentFile' OVERWRITE INTO TABLE $tableName
Hi,
I am intending to save the streaming data from kafka into Cassandra, using
spark-streaming:
But there seems to be problem with line
javaFunctions(data).writerBuilder(testkeyspace, test_table,
mapToRow(TestTable.class)).saveToCassandra();
I am getting NoSuchMethodError.
The code, the
Back to the first question, this will mandate that hive is up and running?
When I try it, I get the following exception. The documentation says that
this method works only on SchemaRDD. I though that countries.saveAsTable
did not work for that a reason so I created a tmp that contains the results
+1 with 1.3-SNAPSHOT.
On Mon, Dec 1, 2014 at 5:49 PM, agg212 alexander_galaka...@brown.edu
wrote:
Thanks for your reply, but I'm still running into issues
installing/configuring the native libraries for MLlib. Here are the steps
I've taken, please let me know if anything is incorrect.
-
Hi,
I'm trying to use S3 Block file system in spark, i.e. s3:// urls
(*not* s3n://). And I always get the following error:
Py4JJavaError: An error occurred while calling o3188.saveAsParquetFile.
: org.apache.hadoop.fs.s3.S3FileSystemException: Not a Hadoop S3 file.
at
You're using two conflicting versions of the connector: the Scala version
at 1.1.0 and the Java version at 1.0.4.
I don't use Java, but I guess you only need the java dependency for your
job - and with the version fixed.
dependency
groupIdcom.datastax.spark/groupId
Hi
I am getting Stack overflow Error
Exception in main java.lang.stackoverflowerror
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at
Hi
I am getting Stack overflow Error
Exception in main java.lang.stackoverflowerror
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at
Are you using the Commons libs in your app? then you need to depend on
them directly, and not rely on them accidentally being provided by
Spark. There is no trial and error; you must declare all the
dependencies you use in your own code.
Otherwise perhaps you are not actually running with the
I don't believe you can do this unless you implement the save to HDFS
logic yourself. To keep the semantics consistent, these saveAs*
methods will always output a file per partition.
On Mon, Dec 8, 2014 at 11:53 PM, Hafiz Mujadid hafizmujadi...@gmail.com wrote:
Hi Experts!
I want to save
Hi All,
am new to Spark. I tried to connect to Mysql using Spark. want to write
a code in Java but
getting runtime exception. I guess that the issue is with the function0
and function1 objects being passed in JDBCRDD .
I tried my level best and attached the code, can you please help us to
Hi,
I am using Spark SQL on 1.2.1-snapshot.
Here is problem I encountered. Bacially, I want to save a schemaRDD to
HiveContext
val scm = StructType(
Seq(
StructField(name, StringType, nullable = false),
StructField(cnt, IntegerType, nullable = false)
))
val schRdd
Hi ,
I am running spark streaming in standalone mode on twitter source into single
machine(using HDP virtual box) I receive status from streaming context and I
can print the same but when I try to save those statuses as RDD into Hadoop
using rdd.saveAsTextFiles or
Looks like this issue has been fixed very recently and should be available in
next RC :-
http://apache-spark-developers-list.1001551.n3.nabble.com/CREATE-TABLE-AS-SELECT-does-not-work-with-temp-tables-in-1-2-0-td9662.html
--
View this message in context:
Hi,
We've a number of Spark Streaming /Kafka jobs that would benefit of an even
spread of consumers over physical hosts in order to maximize network usage.
As far as I can see, the Spark Mesos scheduler accepts resource offers
until all required Mem + CPU allocation has been satisfied.
This
Thank you.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/registerTempTable-Table-not-found-tp20592p20594.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
According to the stacktrace, you were still using SQLContext rather than
HiveContext. To interact with Hive, HiveContext *must* be used.
Please refer to this page
http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables
On 12/9/14 6:26 PM, Anas Mosaad wrote:
Back to the
Hi,
I've built spark 1.3 with hadoop 2.6 but when I startup the spark-shell I
get the following exception:
14/12/09 06:54:24 INFO server.AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
14/12/09 06:54:24 INFO util.Utils: Successfully started service 'SparkUI'
on port 4040.
14/12/09
Hi,
I have a little problem in submitting our application to a mesos cluster.
Basically the mesos cluster is configured and I'm able to have spark-shell
working correctly. Then I tried to launch our application jar (a uber, sbt
assembly jar with all deps):
bin/spark-submit --master
Thanks Akhil. I was wondering why it isn't available to find the class even
though it existed in the same class loader as SparkContext. As a
workaround, I used the following code the add all dependent jars in a
playframework application to spark context.
@tailrec
private def
We have a similar case in which we don't want to save data to Cassandra if
the data is empty.
In our case, we filter the initial DStream to process messages that go to a
given table.
To do so, we're using something like this:
dstream.foreachRDD{ (rdd,time) =
tables.foreach{ table =
val
Thanks Sandy!
On Mon, Dec 8, 2014 at 23:15 Sandy Ryza sandy.r...@cloudera.com wrote:
Another thing to be aware of is that YARN will round up containers to the
nearest increment of yarn.scheduler.minimum-allocation-mb, which defaults
to 1024.
-Sandy
On Sat, Dec 6, 2014 at 3:48 PM, Denny Lee
Thanks! Will try it out.
From: Debasish Das [mailto:debasish.da...@gmail.com]
Sent: Monday, December 08, 2014 5:13 PM
To: Bui, Tri
Cc: user@spark.apache.org
Subject: Re: Learning rate or stepsize automation
Hi Bui,
Please use BFGS based solvers...For BFGS you don't have to specify step size
Hi Cristovao,
I have seen a very similar issue that I have posted about in this thread:
http://apache-spark-user-list.1001560.n3.nabble.com/Kryo-NPE-with-Array-td19797.html
I think your main issue here is somewhat similar, in that the MapWrapper
Scala class is not registered. This gets registered
Hi Deepa,
In Scala, You will do something like
https://gist.github.com/akhld/ccafb27f098163bea622
With Java API's it will be something like
https://gist.github.com/akhld/0d9299aafc981553bc34
Thanks
Best Regards
On Tue, Dec 9, 2014 at 6:39 PM, Deepa Jayaveer deepa.jayav...@tcs.com
wrote:
Hi
You can use this Maven dependency:
dependency
groupIdcom.twitter/groupId
artifactIdchill-avro/artifactId
version0.4.0/version
/dependency
Simone Franzini, PhD
http://www.linkedin.com/in/simonefranzini
On Tue, Dec 9, 2014 at 9:53 AM, Cristovao Jose Domingues Cordeiro
Thanks Nick... still no luck.
If I use ?q=somerandomcharsfields=title,_source
I get an exception about empty collection, which seems to indicate it is
actually using the supplied es.query, but somehow when I do rdd.take(1) or
take(10), all I get is the id and an empty dict, apparently... maybe
found a format that worked, kind of accidentally:
es.query : {query:{match_all:{}},fields:[title,_source]}
Thanks,
Mohamed.
On Tue, Dec 9, 2014 at 11:27 AM, Mohamed Lrhazi
mohamed.lrh...@georgetown.edu wrote:
Thanks Nick... still no luck.
If I use ?q=somerandomcharsfields=title,_source
Hi,
@Gerard- Thanks, i added one more dependency for
conf.set(spark.cassandra.connection.host, localhost).
But now, i am able to create a connection, but the data is not getting inserted
into the cassandra table.
and the logs show its getting connected and the next second getting
I am having an issue with pyspark launched in ec2 (using spark-ec2) with 5
r3.4xlarge machines where each has 32 threads and 240GB of RAM. When I do
sc.textFile to load data from a number of gz files, it does not progress as
fast as expected. When I log-in to a child node and run top, I see only 4
Not yet unfortunately. You could cast the timestamp to a long if you don't
need nanosecond precision.
Here is a related JIRA: https://issues.apache.org/jira/browse/SPARK-4768
On Mon, Dec 8, 2014 at 11:27 PM, ZHENG, Xu-dong dong...@gmail.com wrote:
I meet the same issue. Any solution?
On
While trying simple examples of PySpark code, I systematically get these
failures when I try this.. I dont see any prior exceptions in the output...
How can I debug further to find root cause?
es_rdd = sc.newAPIHadoopRDD(
inputFormatClass=org.elasticsearch.hadoop.mr.EsInputFormat,
val newSchemaRDD = sqlContext.applySchema(existingSchemaRDD,
existingSchemaRDD.schema)
This line is throwing away the logical information about existingSchemaRDD
and thus Spark SQL can't know how to push down projections or predicates
past this operator.
Can you describe more the problems
You might also try out the recently added support for views.
On Mon, Dec 8, 2014 at 9:31 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Ah... I see. Thanks for pointing it out.
Then it means we cannot mount external table using customized column
names. hmm...
Then the only option left is
Dear all,
I would like to run a simple spark job on EMR with yarn.
My job is the follows:
public voidEMRRun() {
SparkConf sparkConf
=newSparkConf().setAppName(RunEMR).setMaster(yarn-cluster);
sparkConf.set(spark.executor.memory,13000m);
JavaSparkContext ctx
If all RDD elements within a partition contain pointers to a single shared
object, Spark persists as expected when the RDD is small. However, if the
RDD is more than *200 elements* then Spark reports requiring much more
memory than it actually does. This becomes a problem for large RDDs, as
Spark
Hi I'm working on Spark that comes with CDH 5.2.0
I'm trying to get a hive context in the shell and I'm running into problems
that I don't understand.
I have added hive-site.xml to the conf folder under /usr/lib/spark/conf as
indicated elsewhere
Here is what I see.Pls help
Hello,
In CDH 5.2 you need to manually add Hive classes to the classpath of
your Spark job if you want to use the Hive integration. Also, be aware
that since Spark 1.1 doesn't really support the version of Hive
shipped with CDH 5.2, this combination is to be considered extremely
experimental.
On
Hi I'm working on Spark that comes with CDH 5.2.0
I'm trying to get a hive context in the shell and I'm running into problems
that I don't understand.
I have added hive-site.xml to the conf folder under /usr/lib/spark/conf as
indicated elsewhere
Here is what I see.Pls help
I know quite a lot about machine learning, but new to scala and spark. Got
stuck due to Spark API, so please advise.
I have a txt file with each line format like this
#label \t # query, a strong of words, delimited by space
1 wireless amazon kindle
2 apple iPhone 5
1 kindle fire 8G
2
i have and RDD i want to filter and for a single term all works good:
ie
dataRDD.filter(x=x._2 ==apple)
how can i use multiple values, for example if i wanted to filter my rdd to
take out apples and oranges and pears with out using . This could
get long winded as there may be quite a few. Can
This is more a scala specific question. I would look at the List contains
implementation
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/equivalent-to-sql-in-tp20599p20600.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
the format is bad, the question link is here
http://stackoverflow.com/questions/27370170/query-classification-using-apache-spark-mlib
http://stackoverflow.com/questions/27370170/query-classification-using-apache-spark-mlib
--
View this message in context:
To report back how I ultimately solved this issue and someone else can do:
1) Check each jar class path and make sure the jars are listed in the order of
Guava class version (i.e. spark-assembly needs to list before Hadoop 2.4
because spark-assembly has guava 14 and Hadoop 2.4 has guava 11).
Hi,
My name is Abhik Majumdar and I am a co-founder of Vidora Corp. We use
Spark at Vidora to power our machine learning stack and we are requesting
to be included on your Powered by Spark page:
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark
Here is the information you
Option 1:
dataRDD.filter(x=(x._2 ==apple) || (x._2 ==orange))
Option 2:
val fruits = Set(apple, orange, pear)
dataRDD.filter(x=fruits.contains(x._2))
Mohammed
-Original Message-
From: dizzy5112 [mailto:dave.zee...@gmail.com]
Sent: Tuesday, December 9, 2014 2:16 PM
To:
We have dockerized Spark Master and worker(s) separately and are using it in
our dev environment. We don't use Mesos though, running it in Standalone
mode, but adding Mesos should not be that difficult I think.
Regards
Venkat
--
View this message in context:
I have set up a cluster on AWS and am trying a really simple hello world
program as a test. The cluster was built using the ec2 scripts that come
with Spark. Anyway, I have output the error message (using --verbose)
below. The source code is further below that.
Any help would be greatly
Dear all,
I would like to run a simple spark job on EMR with yarn.
My job is the follows:
public voidEMRRun() {
SparkConf sparkConf
=newSparkConf().setAppName(RunEMR).setMaster(yarn-cluster);
sparkConf.set(spark.executor.memory,13000m);
JavaSparkContext ctx
https://issues.apache.org/jira/browse/YARN-321
There is not a generic application history server yet. The current one
works for MR.
On Tue, Dec 9, 2014 at 4:48 PM, Nagy István tyson...@gmail.com wrote:
Dear all,
I would like to run a simple spark job on EMR with yarn.
My job is the
Hi Spark users,
I've been attempting to get flambo
https://github.com/yieldbot/flambo/blob/develop/README.md, a Clojure
library for Spark, working with my codebase. After getting things to build
with this very simple interface:
(ns sharknado.core
(:require [flambo.conf :as conf]
From 1.1.1 documentation, it seems one can use HiveContext instead of
SQLContext without having a Hive installation. The benefit is richer SQL
dialect.
Is my understanding correct ?
Thanks
That is correct. It the hive context will create an embedded metastore in
the current directory if you have not configured hive.
On Tue, Dec 9, 2014 at 5:51 PM, Manoj Samel manojsamelt...@gmail.com
wrote:
From 1.1.1 documentation, it seems one can use HiveContext instead of
SQLContext without
In that case, what should be the behavior of saveTableAs?
On Dec 10, 2014 4:03 AM, Michael Armbrust mich...@databricks.com wrote:
That is correct. It the hive context will create an embedded metastore in
the current directory if you have not configured hive.
On Tue, Dec 9, 2014 at 5:51 PM,
It works exactly like Create Table As Select (CTAS) in Hive.
Cheng Hao
From: Anas Mosaad [mailto:anas.mos...@incorta.com]
Sent: Wednesday, December 10, 2014 11:59 AM
To: Michael Armbrust
Cc: Manoj Samel; user@spark.apache.org
Subject: Re: Can HiveContext be used without using Hive?
In that
I'm trying to build a very simple scala standalone app using the Mllib, but I
get the following error when trying to bulid the program:Object mllib is not a
member of package org.apache.spark
please note I just migrated from 1.0.2 to 1.1.1
Best Regards
Hi Michael,
I think I have found the exact problem in my case. I see that we have
written something like following in Analyzer.scala :-
// TODO: pass this in as a parameter.
val fixedPoint = FixedPoint(100)
and
Batch(Resolution, fixedPoint,
ResolveReferences ::
HI
I am getting Stack overflow Error
Exception in main java.lang.stackoverflowerror
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
at
Thanks Akhil but it is expecting Function1 instead of Function .. I tried
out writing a new class by implementing Function1 but
got an error . can you please help us to get it resolved
JDBCRDD is created as
JdbcRDD rdd = new JdbcRDD(sc, getConnection, sql, 0, 0, 1,
getResultset,
somehow I now get a better error trace, I believe for the same root
issue... any advice of how to narrow this down further highly appreciated:
...
14/12/10 07:15:03 ERROR PythonRDD: Python worker exited unexpectedly
(crashed)
org.apache.spark.api.python.PythonException: Traceback (most recent
Try changing this line
*JdbcRDD* rdd = *new* *JdbcRDD*(sc, getConnection, sql, 0, 0, 1,
getResultset, ClassTag$.*MODULE$*.apply(String.*class*));
to
*JdbcRDD* rdd = *new* *JdbcRDD*(sc, getConnection, sql, 0, 100, 1,
getResultset,
Hi Akhil,
Getting the same error . I guess that the issue on Function1
implementation.
is it enough if we override apply method in Function1 class?
Thanks
Deepa
From: Akhil Das ak...@sigmoidanalytics.com
To: Deepa Jayaveer deepa.jayav...@tcs.com
Cc: user@spark.apache.org
I see that somebody had already raised a PR for this but it hasn't been
merged.
https://issues.apache.org/jira/browse/SPARK-4339
Can we merge this in next 1.2 RC?
Thanks
-Nitin
On Wed, Dec 10, 2014 at 11:50 AM, Nitin Goyal nitin2go...@gmail.com wrote:
Hi Michael,
I think I have found the
On Tue, Dec 9, 2014 at 11:32 AM, Mohamed Lrhazi
mohamed.lrh...@georgetown.edu wrote:
While trying simple examples of PySpark code, I systematically get these
failures when I try this.. I dont see any prior exceptions in the output...
How can I debug further to find root cause?
es_rdd =
Hi,
Unfortunately I am also getting the same error
Anybody solved it??..
Exception in main java.lang.stackoverflowerror
scala.util.parsing.combinator.Parsers$$anon$3.apply(Parsers.scala:222)
at
scala.util.parsing.combinator.Parsers$Parser$$anonfun$append$1.apply(Parsers.scala:254)
74 matches
Mail list logo