hi all,
we are switching from scala 2.11 to 2.12 with a spark 2.4.1 release
candidate and so far this has been going pretty smoothly.
however we do see some new serialization errors related to Function1,
Function2, etc.
they look like this:
ClassCastException: cannot assign instance of
java.lang.
Hi,
Occasionally, spark generates some parquet files having only 4 bytes. The
content is "PAR1". ETL spark jobs cannot handle such corrupted files and
ignore the whole partition containing such poison pill files, causing big
data loss.
Spark also generates 0 bytes parquet files but they can be ha
A little late, but have you looked at https://livy.incubator.apache.org/,
works well for us.
-Todd
On Thu, Mar 28, 2019 at 9:33 PM Jason Nerothin
wrote:
> Meant this one: https://docs.databricks.com/api/latest/jobs.html
>
> On Thu, Mar 28, 2019 at 5:06 PM Pat Ferrel wrote:
>
>> Thanks, are you
Hi Team,
I am executing same spark code using the Spark SQL API and DataFrame
API, however, Spark SQL is taking longer than expected.
PFB Sudo code.
---
Case 1 : Spark SQL
-
We observed the following bug on Spark 2.4.0:
scala>
spark.createDataset(Seq((1,2))).write.partitionBy("_1").parquet("foo.parquet")
scala> val schema = StructType(Seq(StructField("_1",
IntegerType),StructField("_2", IntegerType)))
scala> spark.read.schema(schema).parquet("foo.parquet").as[(Int
How many tables? What DB?
On Fri, Mar 29, 2019 at 00:50 Surendra , Manchikanti <
surendra.manchika...@gmail.com> wrote:
> Hi Jason,
>
> Thanks for your reply, But I am looking for a way to parallelly extract
> all the tables in a Database.
>
>
> On Thu, Mar 28, 2019 at 2:50 PM Jason Nerothin
> w
Have you tried apache Livy?
On Fri, 29 Mar 2019 at 9:32 pm, Jianneng Li wrote:
> Hi Pat,
>
> Now that I understand your terminology better, the method I described was
> actually closer to spark-submit than what you referred to as
> "programmatically". You want to have SparkContext running in the
Hi Pat,
Now that I understand your terminology better, the method I described was
actually closer to spark-submit than what you referred to as
"programmatically". You want to have SparkContext running in the launcher
program, and also the driver somehow running on the cluster, and unfortunately
Hello Jack,
You can also have a look at “Babar”, there is a nice “flame graph” feature
too. I haven’t had the time to test it out.
https://github.com/criteo/babar
JC
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
--
Hi Jack,
You can try sparklens (https://github.com/qubole/sparklens). I think it
won't give details at as low a level as you're looking for, but it can help
you identify and remove performance bottlenecks.
~ Hariharan
On Fri, Mar 29, 2019 at 12:01 AM bo yang wrote:
> Yeah, these options are ve
10 matches
Mail list logo