--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Mapping-directory-structure-to-columns-in-SparkSQL-tp20880.html
http://apache-spark-user-list.1001560.n3.nabble.com/Mapping-directory-structure-to-columns-in-SparkSQL-tp20880.html
Sent from the Apache Spark User
Hey Nathan,
Thanks for sharing, this is a very interesting post :) My comments are
inlined below.
Cheng
On 1/7/15 11:53 AM, Nathan McCarthy wrote:
Hi,
I’m trying to use a combination of SparkSQL and ‘normal' Spark/Scala
via rdd.mapPartitions(…). Using the latest release 1.2.0.
Simple
Any ideas? :)
From: Nathan
nathan.mccar...@quantium.com.aumailto:nathan.mccar...@quantium.com.au
Date: Wednesday, 7 January 2015 2:53 pm
To: user@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: SparkSQL schemaRDD MapPartitions calls
I am working with CDH5.2 (Spark 1.0.0) and wondering which version of Spark
comes with SparkSQL by default. Also, will SparkSQL come enabled to access
the Hive Metastore? Is there an easier way to enable Hive support without
have to build the code with various switches?
Thanks,
Abhi
--
Abhi
Disclaimer: this seems more of a CDH question, I'd suggest sending
these to the CDH mailing list in the future.
CDH 5.2 actually has Spark 1.1. It comes with SparkSQL built-in, but
it does not include the thrift server because of incompatibilities
with the CDH version of Hive. To use Hive support
Hi Kevin,
Say A has 10 ids, so you are pulling data from B's data source only for
these 10 ids?
What if you load A and B as separate schemaRDDs and then do join. Spark
will optimize the path anyway when action is fired .
On Mon, Jan 5, 2015 at 2:28 AM, Dai, Kevin yun...@ebay.com wrote:
Hi,
This package is moved here: https://github.com/databricks/spark-avro
On 1/6/15 5:12 AM, yanenli2 wrote:
Hi All,
I want to use the SparkSQL to manipulate the data with Avro format. I
found a solution at https://github.com/marmbrus/sql-avro . However it
doesn't compile successfully anymore
to concatenate column values in the
query like col1+'
+col3. For some reason, this issue is not manifesting itself when I
do a single IF query.
Is there a concat function in SparkSQL? I can't find anything in the
documentation.
Thanks,
RK
On Sunday, January 4, 2015 7:42 PM, RK prk
thanks for the reply! Now I know that this package is moved here:
https://github.com/databricks/spark-avro
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-support-for-reading-Avro-files-tp20981p21040.html
Sent from the Apache Spark User List
Hi,
I’m trying to use a combination of SparkSQL and ‘normal' Spark/Scala via
rdd.mapPartitions(…). Using the latest release 1.2.0.
Simple example; load up some sample data from parquet on HDFS (about 380m rows,
10 columns) on a 7 node cluster.
val t = sqlC.parquetFile(/user/n/sales
Hi, All
Suppose I want to join two tables A and B as follows:
Select * from A join B on A.id = B.id
A is a file while B is a database which indexed by id and I wrapped it by Data
source API.
The desired join flow is:
1. Generate A's RDD[Row]
2. Generate B's RDD[Row] from A by
Can you paste the error log?
From: Dai, Kevin [mailto:yun...@ebay.com]
Sent: Monday, January 5, 2015 6:29 PM
To: user@spark.apache.org
Subject: Implement customized Join for SparkSQL
Hi, All
Suppose I want to join two tables A and B as follows:
Select * from A join B on A.id = B.id
:
Hi All,
I want to use the SparkSQL to manipulate the data with Avro format. I
found a solution at https://github.com/marmbrus/sql-avro . However it
doesn't compile successfully anymore with the latent code of Spark
version 1.2.0 or 1.2.1.
I then try to pull a copy from github stated
Hi All,
I want to use the SparkSQL to manipulate the data with Avro format. I
found a solution at https://github.com/marmbrus/sql-avro . However it
doesn't compile successfully anymore with the latent code of Spark
version 1.2.0 or 1.2.1.
I then try to pull a copy from github stated
When I use a single IF statement like select IF(col1 != , col1+'$'+col3,
col2+'$'+col3) from my_table, it works fine.
However, when I use a nested IF like select IF(col1 != , col1+'$'+col3,
IF(col2 != , col2+'$'+col3, '$')) from my_table, I am getting the following
exception.
Exception in
BTW, I am seeing this issue in Spark 1.1.1.
On Sunday, January 4, 2015 7:29 PM, RK prk...@yahoo.com.INVALID wrote:
When I use a single IF statement like select IF(col1 != , col1+'$'+col3,
col2+'$'+col3) from my_table, it works fine.
However, when I use a nested IF like select IF(col1
The issue is happening when I try to concatenate column values in the query
like col1+'$'+col3. For some reason, this issue is not manifesting itself
when I do a single IF query.
Is there a concat function in SparkSQL? I can't find anything in the
documentation.
Thanks,RK
On Sunday
Most of the time a NoSuchMethodError means wrong classpath settings, and
some jar file is overriden by a wrong version. In your case it could be
netty.
On 1/3/15 1:36 PM, Niranda Perera wrote:
Hi all,
I am evaluating the spark sources API released with Spark 1.2.0. But
I'm getting a
Hi all,
I am evaluating the spark sources API released with Spark 1.2.0. But I'm
getting a ava.lang.NoSuchMethodError:
org.jboss.netty.channel.socket.nio.NioWorkerPool.init(Ljava/util/concurrent/Executor;I)V
error running the program.
Error log:
15/01/03 10:41:30 ERROR ActorSystemImpl: Uncaught
files.
Does anyone know if something like this is supported, or whether this is a
reasonable thing to request?
Mick
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Mapping-directory-structure-to-columns-in-SparkSQL-tp20880.html
.nabble.com/Mapping-directory-structure-to-columns-in-SparkSQL-tp20880.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands
this is a
reasonable thing to request?
Mick
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Mapping-directory-structure-to-columns-in-SparkSQL-tp20880.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
...@gmail.com]
*Sent:* Wednesday, December 24, 2014 4:26 AM
*To:* user@spark.apache.org
*Subject:* SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD
Hi spark users,
I'm trying to create external table using HiveContext after creating a
schemaRDD and saving the RDD into a parquet file on hdfs.
I would
Doh...figured it out.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Array-type-support-Unregonized-Thrift-TTypeId-value-ARRAY-TYPE-tp20817p20832.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi spark users,
I'm trying to create external table using HiveContext after creating a
schemaRDD and saving the RDD into a parquet file on hdfs.
I would like to use the schema in the schemaRDD (rdd_table) when I create
the external table.
For example:
@spark.apache.org
Subject: SparkSQL: CREATE EXTERNAL TABLE with a SchemaRDD
Hi spark users,
I'm trying to create external table using HiveContext after creating a
schemaRDD and saving the RDD into a parquet file on hdfs.
I would like to use the schema in the schemaRDD (rdd_table) when I create
Could you please file a JIRA together with the Git commit you're using?
Thanks!
On 12/18/14 2:32 AM, Hao Ren wrote:
Hi,
When running SparkSQL branch 1.2.1 on EC2 standalone cluster, the following
query does not work:
create table debug as
select v1.*
from t1 as v1 left join t2 as v2
on v1
Hi,
When running SparkSQL branch 1.2.1 on EC2 standalone cluster, the following
query does not work:
create table debug as
select v1.*
from t1 as v1 left join t2 as v2
on v1.sku = v2.sku
where v2.sku is null
Both t1 and t2 have 200 partitions.
t1 has 10k rows, and t2 has 4k rows.
this query
Hi,
I am using SparkSQL on 1.1.0 branch.
The following code leads to a scala.MatchError
at
org.apache.spark.sql.catalyst.expressions.Cast.cast$lzycompute(Cast.scala:247)
val scm = StructType(inputRDD.schema.fields.init :+
StructField(list,
ArrayType(
StructType
I worked man.. Thanks alot :)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Using-sparkSQL-to-convert-a-collection-of-python-dictionary-of-dictionaries-to-schma-RDD-tp20228p20461.html
Sent from the Apache Spark User List mailing list archive
All values in Hive are always nullable, though you should still not be
seeing this error.
It should be addressed by this patch:
https://github.com/apache/spark/pull/3150
On Fri, Dec 5, 2014 at 2:36 AM, Hao Ren inv...@gmail.com wrote:
Hi,
I am using SparkSQL on 1.1.0 branch.
The following
SahanB
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Using-sparkSQL-to-convert-a-collection-of-python-dictionary-of-dictionaries-to-schma-RDD-tp20228p20364.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
what to do about this. Hope you can help :)
Many thanks
SahanB
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Using-sparkSQL-to-convert-a-collection-of-python-dictionary-of-dictionaries-to-schma-RDD-tp20228p20364.html
Sent from the Apache Spark User
Hi Guys,
I am trying to use SparkSQL to convert an RDD to SchemaRDD so that I can
save it in parquet format.
A record in my RDD has the following format:
RDD1
{
field1:5,
field2: 'string',
field3: {'a':1, 'c':2}
}
I am using field3 to represent a sparse vector and it can have keys
,
I am trying to use SparkSQL to convert an RDD to SchemaRDD so that I can
save it in parquet format.
A record in my RDD has the following format:
RDD1
{
field1:5,
field2: 'string',
field3: {'a':1, 'c':2}
}
I am using field3 to represent a sparse vector and it can have keys
Hi all,
I am new to Spark and currently I am trying to run a SparkSQL query on HBase
entity. For an entity with about 4000 rows, it will take about 12 seconds.
Is it expected? Is there any way to shorten the query process?
Here is the code snippet:
SparkConf sparkConf = new
SparkConf
Thank you for answering, this is all very helpful!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/advantages-of-SparkSQL-tp19661p19753.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi,
I am trying to launch a spark 1.2 cluster with SparkSQL and custom
authentication. After launching the cluster using the ec2 scripts, I copied
the following hive-site.xml file into spark/conf dir:
/configuration
property
namehive.server2.authentication/name
valueCUSTOM/value
/property
Hi,
Looks like the latest SparkSQL with Hive 0.12 has a bug in Parquet support.
I got the following exceptions:
org.apache.hadoop.hive.ql.parse.SemanticException: Output Format must
implement HiveOutputFormat, otherwise it should be either
IgnoreKeyTextOutputFormat or SequenceFileOutputFormat
/usr/lib/hive/lib doesn’t show any of the parquet
jars, but ls /usr/lib/impala/lib shows the jar we’re looking for as
parquet-hive-1.0.jar
Is it removed from latest Spark?
Jianshi
On Wed, Nov 26, 2014 at 2:13 PM, Jianshi Huang jianshi.hu...@gmail.com
wrote:
Hi,
Looks like the latest SparkSQL
Hi,
Is there any advantage to storing data as a parquet format, loading it using
the sparkSQL context, but never registering as a table/using sql on it?
Something like:
Something like:
data = sqc.parquetFile(path)
results = data.map(lambda x: applyfunc(x.field))
Is this faster/more optimised
criterion. Other than that, you would also get compression, and likely
save processor cycles when parsing lines from text files.
On Mon, Nov 24, 2014 at 8:20 AM, mrm ma...@skimlinks.com wrote:
Hi,
Is there any advantage to storing data as a parquet format, loading it
using
the sparkSQL context
AM, mrm ma...@skimlinks.com wrote:
Hi,
Is there any advantage to storing data as a parquet format, loading it
using
the sparkSQL context, but never registering as a table/using sql on it?
Something like:
Something like:
data = sqc.parquetFile(path)
results = data.map(lambda x: applyfunc
mailto:ma...@skimlinks.com wrote:
Hi,
Is there any advantage to storing data as a parquet format,
loading it using
the sparkSQL context, but never registering as a table/using
sql on it?
Something like:
Something like:
data
Hi,
I think you can try
cast(l.timestamp as string)='2012-10-08 16:10:36.0'
Thanks,
Daoyuan
-Original Message-
From: whitebread [mailto:ale.panebia...@me.com]
Sent: Sunday, November 23, 2014 12:11 AM
To: u...@spark.incubator.apache.org
Subject: Re: SparkSQL Timestamp query failure
: SparkSQL Timestamp query failure
Thanks for your answer Akhil,
I have already tried that and the query actually doesn't fail but it doesn't
return anything either as it should.
Using single quotes I think it reads it as a string and not as a timestamp.
I don't know how to solve
” is the keyword of data type in
Hive/Spark SQL.)
From: Alessandro Panebianco [mailto:ale.panebia...@me.com]
Sent: Monday, November 24, 2014 11:12 AM
To: Wang, Daoyuan
Cc: u...@spark.incubator.apache.org
Subject: Re: SparkSQL Timestamp query failure
Hey Daoyuan,
following your suggestion I obtain
=19613i=1
Subject: Re: SparkSQL Timestamp query failure
Hey Daoyuan,
following your suggestion I obtain the same result as when I do:
where l.timestamp = '2012-10-08 16:10:36.0’
what happens using either your suggestion or simply using single quotes as I
just typed
)
at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL
,
Alessandro
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502p19554.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-Timestamp-query-failure-tp19502.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
I would expect an SQL query on c would fail because c would not be known in
the schema of the older Parquet file.
What I'd be very interested in is how to add a new column as an incremental
new parquet file, and be able to somehow join the existing and new file, in
an efficient way. IE, somehow
Hi,
I'm loading a bunch of json files and there seems to be problems with
specific files (either schema changes or incomplete files).
I'd like to catch the inconsistent files but I'm not sure how to do it.
This is the exception I get:
14/11/20 00:13:49 INFO cluster.YarnClientClusterScheduler:
Update:
I tried surrounding the problematic code with try and catch but that does
not do the trick:
try
{
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
val jsonFiles=sqlContext.jsonFile(/requests.loading)
} catch {
case _: Throwable = // Catching all exceptions and
as well as SparkSQL. My question is more on how to build out the RDD
files and best practices. I have data that is broken down by hour into
files on HDFS in avro format. Do I need to create a separate RDD for each
file? or using SparkSQL a separate SchemaRDD?
I want to be able to pull lets say
went wrong while
scanning this LZO compressed Parquet file. But unfortunately the stack
trace at hand doesn’t indicate the root cause.
Cheng
On 11/15/14 5:28 AM, Sadhan Sood wrote:
While testing SparkSQL on a bunch of parquet files (basically used to
be a partition for one of our hive tables
wrote:
While testing SparkSQL on a bunch of parquet files (basically used to
be a partition for one of our hive tables), I encountered this error:
import org.apache.spark.sql.SchemaRDD
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.conf.Configuration;
import
wrong
while scanning this LZO compressed Parquet file. But unfortunately the
stack trace at hand doesn’t indicate the root cause.
Cheng
On 11/15/14 5:28 AM, Sadhan Sood wrote:
While testing SparkSQL on a bunch of parquet files (basically used
to be a partition for one of our hive tables), I
to spark. I have began to read to understand sparks RDD
files as well as SparkSQL. My question is more on how to build out the
RDD
files and best practices. I have data that is broken down by hour into
files on HDFS in avro format. Do I need to create a separate RDD for
each
file? or using
scanning this LZO compressed Parquet file. But unfortunately the
stack trace at hand doesn’t indicate the root cause.
Cheng
On 11/15/14 5:28 AM, Sadhan Sood wrote:
While testing SparkSQL on a bunch of parquet files (basically used
to be a partition for one of our hive tables), I encountered
On Tue, Nov 18, 2014 at 10:34 PM, Night Wolf nightwolf...@gmail.com wrote:
Is there a better way to mock this out and test Hive/metastore with
SparkSQL?
I would use TestHive which creates a fresh metastore each time it is
invoked.
find it here:
https://github.com/databricks/spark-avro
Bug reports welcome!
Michael
On Wed, Nov 19, 2014 at 1:02 PM, Sam Flint sam.fl...@magnetic.com
wrote:
Hi,
I am new to spark. I have began to read to understand sparks RDD
files as well as SparkSQL. My question is more on how
, Michael Armbrust
mich...@databricks.com wrote:
What version of Spark SQL?
On Sat, Nov 15, 2014 at 10:25 PM, Eric Zhen zhpeng...@gmail.com
wrote:
Hi all,
We run SparkSQL on TPCDS benchmark Q19 with spark.sql.codegen=true,
we got exceptions as below, has anyone else saw these before
wrote:
Hi Michael,
We use Spark v1.1.1-rc1 with jdk 1.7.0_51 and scala 2.10.4.
On Tue, Nov 18, 2014 at 7:09 AM, Michael Armbrust
mich...@databricks.com wrote:
What version of Spark SQL?
On Sat, Nov 15, 2014 at 10:25 PM, Eric Zhen zhpeng...@gmail.com
wrote:
Hi all,
We run SparkSQL
Hi,
Just to give some context. We are using Hive metastore with csv Parquet
files as a part of our ETL pipeline. We query these with SparkSQL to do
some down stream work.
I'm curious whats the best way to go about testing Hive SparkSQL? I'm
using 1.1.0
I see that the LocalHiveContext has been
What version of Spark SQL?
On Sat, Nov 15, 2014 at 10:25 PM, Eric Zhen zhpeng...@gmail.com wrote:
Hi all,
We run SparkSQL on TPCDS benchmark Q19 with spark.sql.codegen=true, we
got exceptions as below, has anyone else saw these before?
java.lang.ExceptionInInitializerError
Hi Michael,
We use Spark v1.1.1-rc1 with jdk 1.7.0_51 and scala 2.10.4.
On Tue, Nov 18, 2014 at 7:09 AM, Michael Armbrust mich...@databricks.com
wrote:
What version of Spark SQL?
On Sat, Nov 15, 2014 at 10:25 PM, Eric Zhen zhpeng...@gmail.com wrote:
Hi all,
We run SparkSQL on TPCDS
...@gmail.com wrote:
Hi all,
We run SparkSQL on TPCDS benchmark Q19 with spark.sql.codegen=true, we
got exceptions as below, has anyone else saw these before?
java.lang.ExceptionInInitializerError
at
org.apache.spark.sql.execution.SparkPlan.newProjection(SparkPlan.scala:92
On 11/15/14 5:28 AM, Sadhan Sood wrote:
While testing SparkSQL on a bunch of parquet files (basically
used to be a partition for one of our hive tables), I
encountered this error:
import org.apache.spark.sql.SchemaRDD
import org.apache.hadoop.fs.FileSystem
/15/14 5:28 AM, Sadhan Sood wrote:
While testing SparkSQL on a bunch of parquet files (basically used to be
a partition for one of our hive tables), I encountered this error:
import org.apache.spark.sql.SchemaRDD
import org.apache.hadoop.fs.FileSystem;
import
testing SparkSQL on a bunch of parquet files (basically used to
be a partition for one of our hive tables), I encountered this error:
import org.apache.spark.sql.SchemaRDD
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
val
Hi Cheng,
Thanks for your response.Here is the stack trace from yarn logs:
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SparkSQL-exception-on-cached-parquet-table-tp18978p19020.html
Sent from the Apache Spark User List mailing list archive
Hi all,
We run SparkSQL on TPCDS benchmark Q19 with spark.sql.codegen=true, we got
exceptions as below, has anyone else saw these before?
java.lang.ExceptionInInitializerError
at
org.apache.spark.sql.execution.SparkPlan.newProjection(SparkPlan.scala:92
While testing SparkSQL on a bunch of parquet files (basically used to be a
partition for one of our hive tables), I encountered this error:
import org.apache.spark.sql.SchemaRDD
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path
Thanks Cheng, that was helpful. I noticed from UI that only half of the
memory per executor was being used for caching, is that true? We have a 2
TB sequence file dataset that we wanted to cache in our cluster with ~ 5TB
memory but caching still failed and what looked like from the UI was that
it
Hm… Have you tuned |spark.storage.memoryFraction|? By default, 60% of
memory is used for caching. You may refer to details from here
http://spark.apache.org/docs/latest/configuration.html
On 11/15/14 5:43 AM, Sadhan Sood wrote:
Thanks Cheng, that was helpful. I noticed from UI that only half
on this?
Thanks in advance.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/loading-querying-schemaRDD-using-SparkSQL-tp18052p18841.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Thanks Chneg, Just one more question - does that mean that we still need
enough memory in the cluster to uncompress the data before it can be
compressed again or does that just read the raw data as is?
On Wed, Nov 12, 2014 at 10:05 PM, Cheng Lian lian.cs@gmail.com wrote:
Currently there’s
No, the columnar buffer is built in a small batching manner, the batch
size is controlled by the |spark.sql.inMemoryColumnarStorage.batchSize|
property. The default value for this in master and branch-1.2 is 10,000
rows per batch.
On 11/14/14 1:27 AM, Sadhan Sood wrote:
Thanks Chneg, Just
We are running spark on yarn with combined memory 1TB and when trying to
cache a table partition(which is 100G), seeing a lot of failed collect
stages in the UI and this never succeeds. Because of the failed collect, it
seems like the mapPartitions keep getting resubmitted. We have more than
This is the log output:
2014-11-12 19:07:16,561 INFO thriftserver.SparkExecuteStatementOperation
(Logging.scala:logInfo(59)) - Running query 'CACHE TABLE xyz_cached AS
SELECT * FROM xyz where date_prefix = 20141112'
2014-11-12 19:07:17,455 INFO Configuration.deprecation
On re running the cache statement, from the logs I see that when
collect(stage 1) fails it always leads to mapPartition(stage 0) for one
partition to be re-run. This can be seen from the collect log as well on
the container log:
rg.apache.spark.shuffle.MetadataFetchFailedException: Missing an
We noticed while caching data from our hive tables which contain data in
compressed sequence file format that it gets uncompressed in memory when
getting cached. Is there a way to turn this off and cache the compressed
data as is ?
Currently there’s no way to cache the compressed sequence file directly.
Spark SQL uses in-memory columnar format while caching table rows, so we
must read all the raw data and convert them into columnar format.
However, you can enable in-memory columnar compression by setting
-list.1001560.n3.nabble.com/loading-querying-schemaRDD-using-SparkSQL-tp18052p18137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
://apache-spark-user-list.1001560.n3.nabble.com/loading-querying-schemaRDD-using-SparkSQL-tp18052.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr
This is not supported yet. It would be great if you could open a JIRA
(though I think apache JIRA is down ATM).
On Tue, Nov 4, 2014 at 9:40 AM, Terry Siu terry@smartfocus.com wrote:
I’m trying to execute a subquery inside an IN clause and am encountering
an unsupported language feature
/loading-querying-schemaRDD-using-SparkSQL-tp18052.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
.1001560.n3.nabble.com/loading-querying-schemaRDD-using-SparkSQL-tp18052p18137.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional
core is that it performs really
well once you tune it properly.
As far I understand SparkSQL under the hood performs many of these
optimizations (order of Spark operations) and uses a more efficient storage
format. Is this assumption correct?
Has anyone done any comparison of SparkSQL
@spark.apache.orgmailto:user@spark.apache.org
user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: Does SparkSQL work with custom defined SerDe?
Looks like it may be related to
https://issues.apache.org/jira/browse/SPARK-3807.
I will build from branch 1.1 to see if the issue is resolved.
Chen
On Tue, Oct 14
.
Cheng
On Fri, Oct 31, 2014 at 7:04 AM, Jean-Pascal Billaud j...@tellapart.com
wrote:
Hi,
While testing SparkSQL on top of our Hive metastore, I am getting
some java.lang.ArrayIndexOutOfBoundsException while reusing a cached RDD
table.
Basically, I have a table mtable partitioned by some date
to collect column statistics, which causes this
issue. Filed SPARK-4182 to track this issue, will fix this ASAP.
Cheng
On Fri, Oct 31, 2014 at 7:04 AM, Jean-Pascal Billaud j...@tellapart.com
wrote:
Hi,
While testing SparkSQL on top of our Hive metastore, I am getting some
AM, Jean-Pascal Billaud j...@tellapart.com
wrote:
Hi,
While testing SparkSQL on top of our Hive metastore, I am getting
some java.lang.ArrayIndexOutOfBoundsException while reusing a cached RDD
table.
Basically, I have a table mtable partitioned by some date field in
hive and below
Hi,
I am using the latest Cassandra-Spark Connector to access Cassandra tables
form Spark. While I successfully managed to connect Cassandra using
CassandraRDD, the similar SparkSQL approach does not work. Here is my code
for both methods:
import com.datastax.spark.connector._
import
the latest Cassandra-Spark Connector to access Cassandra tables
form Spark. While I successfully managed to connect Cassandra using
CassandraRDD, the similar SparkSQL approach does not work. Here is my code
for both methods:
import com.datastax.spark.connector._
import org.apache.spark
Cassandra
tables form Spark. While I successfully managed to connect Cassandra using
CassandraRDD, the similar SparkSQL approach does not work. Here is my code
for both methods:
import com.datastax.spark.connector._
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql
Cassandra tables
form Spark. While I successfully managed to connect Cassandra using
CassandraRDD, the similar SparkSQL approach does not work. Here is my code
for both methods:
import com.datastax.spark.connector._
import org.apache.spark.{SparkConf, SparkContext}
import
On Oct 31, 2014, at 1:25 PM, shahab shahab.mok...@gmail.com wrote:
Hi,
I am using the latest Cassandra-Spark Connector to access Cassandra
tables form Spark. While I successfully managed to connect Cassandra using
CassandraRDD, the similar SparkSQL approach does not work. Here is my code
for both
I was really surprised to see the results here, esp. SparkSQL not
completing
http://www.citusdata.com/blog/86-making-postgresql-scale-hadoop-style
I was under the impression that SparkSQL performs really well because it
can optimize the RDD operations and load only the columns that are
required
701 - 800 of 1023 matches
Mail list logo