Hi Erica,
On your cluster details, you can click on "Advanced", and then set those
parameters in the "Spark" tab. Hope that helps.
Thanks,
Subhash
On Thu, Feb 4, 2021 at 5:27 PM Erica Lin
wrote:
> Hello!
>
> Is there a way to set spark.sql.shuffle.partitions
> and spark.default.parallelism in
Hi David,
I’m not sure if that is possible, but why not just read the CSV file using the
Scala API, specifying those options, and then query it using SQL by creating a
temp view?
Thanks,
Subhash
Sent from my iPhone
> On Dec 8, 2018, at 12:39 PM, David Markovitz
> wrote:
>
> Hi
> Spark SQL
Hi Spark Users,
We do a lot of processing in Spark using data that is in MS SQL server.
Today, I created a DataFrame against a table in SQL Server using the
following:
val dfSql=spark.read.jdbc(connectionString, table, props)
I noticed that every column in the DataFrame showed as *nullable=true,
Hi Umar,
Could it be that spark.sql.sources.bucketing.enabled is not set to true?
Thanks,
Subhash
Sent from my iPhone
> On Jun 19, 2018, at 11:41 PM, umargeek wrote:
>
> Hi Folks,
>
> I am trying to save a spark data frame after reading from ORC file and add
> two new columns and finally tr
Hi Raymond,
If you set your master to local[*] instead of yarn-client, it should run on
your local machine.
Thanks,
Subhash
Sent from my iPhone
> On Jun 17, 2018, at 2:32 PM, Raymond Xie wrote:
>
> Hello,
>
> I am wondering how can I run spark job in my environment which is a single
> Ubu
27;re going to have (but it must be set in stone), and then they can
> try to pre-optimize the bucket for you.
>
>> On Thu, Mar 8, 2018 at 11:42 AM, Subhash Sriram
>> wrote:
>> Hey Spark user community,
>>
>> I am writing Parquet files from Spark to S3 using S3
Hey Spark user community,
I am writing Parquet files from Spark to S3 using S3a. I was reading this
article about improving S3 bucket performance, specifically about how it
can help to introduce randomness to your key names so that data is written
to different partitions.
https://aws.amazon.com/p
Hey everyone,
I have a use case where I will be processing data in Spark and then writing
it back to MS SQL Server.
Is it possible to use bulk insert functionality and/or batch the writes
back to SQL?
I am using the DataFrame API to write the rows:
sqlContext.write.jdbc(...)
Thanks in advance
If you have the temp view name (table, for example), couldn't you do
something like this?
val dfWithColumn=spark.sql("select *, as new_column from
table")
Thanks,
Subhash
On Thu, Feb 1, 2018 at 11:18 AM, kant kodali wrote:
> Hi,
>
> Are you talking about df.withColumn() ? If so, thats not wha
Hi Soheil,
We have a high availability cluster as well, but I never have to specify the
active master when writing, only the cluster name. It works regardless of which
node is the active master.
Hope that helps.
Thanks,
Subhash
Sent from my iPhone
> On Jan 18, 2018, at 5:49 AM, Soheil Pourb
There are some more properties specifically for YARN here:
http://spark.apache.org/docs/latest/running-on-yarn.html
Thanks,
Subhash
On Wed, Dec 13, 2017 at 2:32 PM, Subhash Sriram
wrote:
> http://spark.apache.org/docs/latest/configuration.html
>
> On Wed, Dec 13, 2017 at 2:31 PM, T
http://spark.apache.org/docs/latest/configuration.html
On Wed, Dec 13, 2017 at 2:31 PM, Toy wrote:
> Hi,
>
> Can you point me to the config for that please?
>
> On Wed, 13 Dec 2017 at 14:23 Marcelo Vanzin wrote:
>
>> On Wed, Dec 13, 2017 at 11:21 AM, Toy wrote:
>> > I'm wondering why am I seei
I was curious about this too, and found this. You may find it helpful:
http://www.tegdesign.com/converting-a-nested-json-document-to-csv-using-scala-hadoop-and-apache-spark/
Thanks,
Subhash
Sent from my iPhone
> On Dec 12, 2017, at 1:44 AM, Prabha K wrote:
>
> Any help on converting json to
Thanks. This is what I was looking for.
>
> one question, where do we need to specify the checkpoint directory in case
> of structured streaming?
>
> Thanks,
> Asmath
>
> On Wed, Oct 25, 2017 at 2:52 PM, Subhash Sriram
> wrote:
>
>> Hi Asmath,
>>
>>
Hi Asmath,
Here is an example of using structured streaming to read from Kafka:
https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/streaming/StructuredKafkaWordCount.scala
In terms of parsing the JSON, there is a from_json function that you can
use.
Hi Devender,
I have always gone with the 2nd approach, only so I don't have to chain a bunch
of "option()." calls together. You should be able to use either.
Thanks,
Subhash
Sent from my iPhone
> On Apr 26, 2017, at 3:26 AM, Devender Yadav
> wrote:
>
> Hi All,
>
>
> I am using Spak 1.6.2
Would it be an option to just write the results of each job into separate
tables and then run a UNION on all of them at the end into a final target
table? Just thinking of an alternative!
Thanks,
Subhash
Sent from my iPhone
> On Apr 20, 2017, at 3:48 AM, Rick Moritz wrote:
>
> Hi List,
>
>
Hi,
We use monotonically_increasing_id() as well, but just cache the table first
like Ankur suggested. With that method, we get the same keys in all derived
tables.
Thanks,
Subhash
Sent from my iPhone
> On Apr 7, 2017, at 7:32 PM, Everett Anderson wrote:
>
> Hi,
>
> Thanks, but that's usi
We have a similar use case. We use the DataFrame API to cache data out of
Hive tables, and then run pretty complex scripts on them. You can register
your Hive UDFs to be used within Spark SQL statements if you want.
Something like this:
sqlContext.sql("CREATE TEMPORARY FUNCTION as ''")
If you h
Could you create a view of the table on your JDBC data source and just query
that from Spark?
Thanks,
Subhash
Sent from my iPhone
> On Mar 7, 2017, at 6:37 AM, El-Hassan Wanas wrote:
>
> As an example, this is basically what I'm doing:
>
> val myDF = originalDataFrame.select(col(column
Hi Allan,
Where is the data stored right now? If it's in a relational database, and you
are using Spark with Hadoop, I feel like it would make sense to move the import
the data into HDFS, just because it would be faster to access the data. You
could use Sqoop to do that.
In terms of having a l
If I am understanding your problem correctly, I think you can just create a
new DataFrame that is a transformation of sample_data by first registering
sample_data as a temp table.
//Register temp table
sample_data.createOrReplaceTempView("sql_sample_data")
//Create new DataSet with transformed va
22 matches
Mail list logo