Mailing lists matching spark.apache.org
commits spark.apache.orgdev spark.apache.org
issues spark.apache.org
reviews spark.apache.org
user spark.apache.org
Re: java.lang.OutOfMemoryError while running SVD MLLib example
user-list.1001560.n3.nabble.com/java-lang-OutOfMemoryError-while-running-SVD-MLLib-example-tp14972p15083.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ----- > To unsubscribe, e-mail: user-unsubscr.
Re: Access by name in "tuples" in Scala with Spark
60.n3.nabble.com/Access-by-name-in-tuples-in-Scala-with-Spark-tp15212.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ----- > To unsubscribe, e-mail: user-unsubscr...@spark
Re: Trouble getting filtering on field correct
sage in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Trouble-getting-filtering-on-field-correct-tp15728.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ----- > To u
Re: How to save ReceiverInputDStream to Hadoop using saveAsNewAPIHadoopFile
e Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > -
Re: spark-sql failing for some tables in hive
.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.
Re: Spark KMeans hangs at reduceByKey / collectAsMap
---- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > ----- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: pyspark - extract 1 field from string
spark-user-list.1001560.n3.nabble.com/pyspark-extract-1-field-from-string-tp16456.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ----- > To unsubscribe, e-mai
Re: reverse an rdd
t.1001560.n3.nabble.com/reverse-an-rdd-tp16602.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additi
Re: spark1.0 principal component analysis
- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: default parallelism bug?
iew this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/default-parallelism-bug-tp16787.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe
Re: Spark 1.1.0 on Hive 0.13.1
wrote: > >> Hi, > >> > >> My Hive is 0.13.1, how to make Spark 1.1.0 run on Hive 0.13? Please > advise. > >> > >> Or, any news about when will Spark 1.1.0 on Hive 0.1.3.1 be available? > >> > >> Regards > >> Arthur >
Re: No module named pyspark - latest built
---- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > ----- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: why MatrixFactorizationModel private?
n context: > http://apache-spark-user-list.1001560.n3.nabble.com/why-MatrixFactorizationModel-private-tp19763p19783.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe,
Re: RDD saveAsObjectFile write to local file and HDFS
jectFile-w >rite-to-local-file-and-HDFS-tp19898.html >Sent from the Apache Spark User List mailing list archive at Nabble.com. > >- >To unsubscribe, e-mail: user-unsubscr...@spark.apac
Re: SVMWithSGD.run source code
de-tp20671.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ----- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > ---
Re: spark-repl_1.2.0 was not uploaded to central maven repository.
ven-repository-tp20799.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > - > > To unsubscribe, e-mail: user-
Re: Using TF-IDF from MLlib
http://apache-spark-user-list.1001560.n3.nabble.com/Using-TF-IDF-from-MLlib-tp19429p20876.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr..
Re: Spark Streaming: HiveContext within Custom Actor
com/Spark-Streaming-HiveContext-within-Custom-Actor-tp20892.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > ----- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >
[jira] [Updated] (SPARK-48994) Add support for interval types in the Variant spec
Affects Versions: 4.0.0 >Reporter: Harsh Motwani >Priority: Major > Labels: pull-request-available > > The Variant spec does not have support for the > [YearMonthIntervalType|https://spark.apache.org/docs/latest/api/java/org/apache/spark
spark git commit: [MINOR][DOCS] Remove Apache Spark Wiki address
in `README.md` and `docs/index.md`, too. These two lines are the last occurrence of that links. ``` All current wiki content has been merged into pages at http://spark.apache.org as of November 2016. Each page links to the new location of its information on the Spark web site. Obsolete wiki content
Re: [YARN] Small fix for yarn.Client to use buildPath (not Path.SEPARATOR)
o fix such small changes. >>>> >>>> [1] >>>> https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala#L1298 >>>> [2] Path.SEPARATOR >>>> >>>
Re: [VOTE] Release Apache Spark 1.1.0 (RC2)
8, 2014 at 8:53 PM, Burak Yavuz wrote: > > +1. Tested MLlib algorithms on Amazon EC2, algorithms show speed-ups > between 1.5-5x compared to the 1.0.2 release. > > > > - Original Message - > > From: "Patrick Wendell" > > To: dev@spark.apache.org &
[jira] [Updated] (SPARK-43036) Spark structuring streaming with kinesis and rocksdb integration
the official documentation - [https://spark.apache.org/docs/latest/streaming-kinesis-integration.html] It does not mention about kinesis connector. Alternative is - [https://github.com/qubole/kinesis-sql] which is not active now. This is now handed over here - [https://github.com/roncemer/spark
[GitHub] [spark] HyukjinKwon commented on a diff in pull request #40092: [SPARK-42475][CONNECT][DOCS] Getting Started: Live Notebook for Spark Connect
;markdown", + "metadata": {}, + "source": [ +"# Quickstart: DataFrame with Spark Connect\n", +"\n", +"This is a short introduction and quickstart for the DataFrame with Spark Connect. A DataFrame with Spark Connect is virtually, conceptually i
Re: Spark 1.4.0 - Using SparkR on EC2 Instance
t; > I am having a bit of trouble finalizing the installation and usage of the > newest Spark version 1.4.0, deploying to an Amazon EC2 instance and using > RStudio to run on top of it. > > Using these instructions ( > http://spark.apache.org/docs/latest/ec2-scripts.html > <
RE: Using dynamic allocation and shuffle service in Standalone Mode
There’s a script to start it up under sbin, start-shuffle-service.sh. Run that on each of your worker nodes. From: Yuval Itzchakov<mailto:yuva...@gmail.com> Sent: Tuesday, March 8, 2016 2:17 PM To: Silvio Fiorito<mailto:silvio.fior...@granturing.com>; user@spark.apache.org
Re: How do I convert a data frame to broadcast variable?
Awesome, thanks Silvio! From: Silvio Fiorito mailto:silvio.fior...@granturing.com>> Date: Thursday, November 3, 2016 at 12:26 PM To: "Jain, Nishit" mailto:nja...@underarmour.com>>, Denny Lee mailto:denny.g....@gmail.com>>, "user@spark.apache.org<mailto:
Re: Spark Window Documentation
Hi Neeraj, I'd start from "Contributing Documentation Changes" in https://spark.apache.org/contributing.html Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books <https://books.japila.pl/> Follow me on https://twi
RE: Possible long lineage issue when using DStream to update a normal RDD
...@gmail.com] Sent: Friday, May 8, 2015 2:51 PM To: Shao, Saisai Cc: user@spark.apache.org Subject: Re: Possible long lineage issue when using DStream to update a normal RDD Thank you for this suggestion! But may I ask what's the advantage to use checkpoint instead of cache here? Cuz they both cut li
Re: Spark KMeans hangs at reduceByKey / collectAsMap
gt;> >> >> >> Thanks. >> >> Ray >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-KMeans-hangs-at-reduceByKey-collectAsMap-tp16413p16428.html >> Sent from t
[spark] branch branch-3.4 updated: [SPARK-42249][SQL] Refining html link for documentation in error messages
proposed in this pull request? This PR proposes to refine html link for documentation in error messages by introducing `SPARK_DOC_ROOT` into `core/src/main/scala/org/apache/spark/package.scala` that contains global directory for documentation root link: `https://spark.apache.org/docs/latest
Re: Help needed in R documentation generation
: Tuesday, February 27, 2018 9:13:18 AM To: Felix Cheung Cc: Mihály Tóth; dev@spark.apache.org Subject: Re: Help needed in R documentation generation Hi, Earlier, at https://spark.apache.org/docs/latest/api/R/index.html I see 1. sin as a title 2. description describes what sin does 3. usage
Re: RE : Re: HDFS small file generation problem
gt; After a CONCATENATE I suppose the records are still updatable. >> >> Tks to confirm if it can be solution for my use case. Or any other idea.. >> >> Thanks a lot ! >> Nicolas >> >> >> - Mail original - >> De: "Jörn Franke" >
Re: LinearRegressionWithSGD and Rank Features By Importance
ns using your trained model. It's not too complicated to implement manually, but Spark API has some support for this already: ML: http://spark.apache.org/docs/latest/ml-features.html#standardscaler MLlib: http://spark.apache.org/docs/latest/mllib-feature-extraction.html#standardscaler Mas
Re: Dependency Problem with Spark / ScalaTest / SBT
t;, "javax.transaction"). exclude("org.eclipse.jetty.orbit", "javax.servlet") ) } resolvers += "Akka Repository
spark git commit: [SPARK-18073][DOCS][WIP] Migrate wiki to spark.apache.org web site
Repository: spark Updated Branches: refs/heads/master 2559fb4b4 -> 7e0cd1d9b [SPARK-18073][DOCS][WIP] Migrate wiki to spark.apache.org web site ## What changes were proposed in this pull request? Updates links to the wiki to links to the new location of content on spark.apache.org. ##
Re: SparkR latest API docs missing?
I think the SparkR release always trails a little bit due to the additional CRAN processes. On Wed, May 8, 2019 at 11:23 AM Shivaram Venkataraman wrote: > > I just noticed that the SparkR API docs are missing at > https://spark.apache.org/docs/latest/api/R/index.html --- It looks &g
[jira] [Created] (SPARK-15228) pyspark.RDD.toLocalIterator Documentation
Type: Documentation Reporter: Ignacio Tartavull Priority: Trivial There is a little bug in the parsing of the documentation of http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.toLocalIterator -- This message was sent by Atlassian JIRA (v6.3.4#6332
[jira] [Created] (SPARK-37873) SQL Syntax links are broken
: Documentation Affects Versions: 3.2.0 Reporter: Alex Ott SQL Syntax links at [https://spark.apache.org/docs/latest/sql-ref.html] are broken -- This message was sent by Atlassian Jira (v8.20.1#820001) - To
[jira] [Created] (SPARK-38184) Fix malformatted ExpressionDescription of `decode`
Type: Improvement Components: Documentation Affects Versions: 3.3.0 Reporter: Xinrong Meng Fix malformatted ExpressionDescription of `Decode` https://spark.apache.org/docs/latest/api/sql/#decode -- This message was sent by Atlassian Jira (v8.20.1#820001
[jira] [Created] (SPARK-25991) Update binary for 2.4.0 release
Feature Components: Spark Core Affects Versions: 2.4.0 Reporter: Vladimir Tsvetkov Archive with 2.4.0 release contains old binaries https://spark.apache.org/downloads.html -- This message was sent by Atlassian JIRA (v7.6.3#76005
[jira] [Created] (SPARK-33547) Doc Type Construct Literal usage
Components: Documentation Affects Versions: 3.1.0 Reporter: angerszhu Add Doc about type construct literal in [https://spark.apache.org/docs/3.0.1/sql-ref-literals.html] -- This message was sent by Atlassian Jira (v8.3.4#803005
[GitHub] spark issue #20219: [SPARK-23025][SQL] Support Null type in scala reflection
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20219 `NullType` is not well supported in almost all the data sources. We did not mention it in our doc https://spark.apache.org/docs/latest/sql-programming-guide.html cc @cloud-fan
[GitHub] spark issue #20618: [SPARK-23329][SQL] Fix documentation of trigonometric fu...
Github user misutoth commented on the issue: https://github.com/apache/spark/pull/20618 @felixcheung, I have started a mail thread on d...@spark.apache.org with title _Help needed in R documentation generation_ because I did not feel it is directly related to this PR. Thanks for your
[GitHub] spark issue #21589: [SPARK-24591][CORE] Number of cores and executors in the...
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/21589 AFAIK, we always have num of executor and then num of core per executor right? https://spark.apache.org/docs/latest/configuration.html#execution-behavior maybe we should have the
[GitHub] spark issue #22593: [Streaming][DOC] Fix typo & format in DataStreamWriter.s...
Github user niofire commented on the issue: https://github.com/apache/spark/pull/22593 From https://spark.apache.org/docs/2.3.2/api/java/org/apache/spark/sql/streaming/DataStreamWriter.html ![image](https://user-images.githubusercontent.com/2295469/46749482-b3351400-cc6a-11e8
[GitHub] spark issue #19154: Fix DiskBlockManager crashing when a root local folder h...
s are proposed: http://spark.apache.org/contributing.html --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[PR] wat [spark]
-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h
[GitHub] spark issue #22339: SPARK-17159 Significant speed up for running spark strea...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/22339 Hi, @ScrapCodes . Could you do the followings? - Update the title to `[SPARK-17159][SS]...` - Remove `Please review http://spark.apache.org/contributing.html ` from PR description
[GitHub] spark issue #22852: [SPARK-25023] Clarify Spark security documentation
Github user srowen commented on the issue: https://github.com/apache/spark/pull/22852 I think these are good changes. In a separate PR for the versions-specific docs, we could add a similar note to https://spark.apache.org/docs/latest/spark-standalone.html as much of the security
RE: visualize data from spark streaming
Gotta roll your own. Look at kafka and websockets for example. Sent from my Verizon Wireless 4G LTE smartphone Original message From: patcharee Date: 01/20/2016 2:54 PM (GMT-05:00) To: user@spark.apache.org Subject: visualize data from spark streaming Hi, How to
Re: Does filter on an RDD scan every data item ?
Looks like this has been supported from 1.4 release :) https://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.rdd.OrderedRDDFunctions -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item
Re: Spark Certification
I was wondering that as well. Also is it fully updated for 1.6? Tim http://airisdata.com/ http://sparkdeveloper.com/ From: naga sharathrayapati mailto:sharathrayap...@gmail.com>> Date: Wednesday, February 10, 2016 at 11:36 PM To: "user@spark.apache.org<mailto:user@sp
Re: spark sql, creating literal columns in java.
This should work from java too: http://spark.apache.org/docs/1.3.1/api/java/index.html#org.apache.spark.sql.functions$ On Tue, May 5, 2015 at 4:15 AM, Jan-Paul Bultmann wrote: > Hey, > What is the recommended way to create literal columns in java? > Scala has the `lit` func
DataFrame DSL documentation
ct: https://spark.apache.org/docs/1.3.0/sql-programming-guide.html https://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.sql.DataFrame https://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.sql.functions$ Thanks, Gerard.
(spark) branch master updated: [SPARK-46141][SQL] Change default for spark.sql.legacy.ctePrecedencePolicy to CORRECTED
"42KD0", - "messageParameters" : { -"config" : "\"spark.sql.legacy.ctePrecedencePolicy\"", -"docroot" : "https://spark.apache.org/docs/latest";, -"name" : "`t`" - } -} +WithCTE +:- CTERelationDef , false
[jira] [Updated] (SPARK-27059) spark-submit on kubernetes cluster does not recognise k8s --master property
: {{C:\windows\system32>kubectl cluster-info }} {{Kubernetes master is running at https://: }} {{KubeDNS is running at https://:/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy}} Trying to run the SparkPi with the Spark I downloaded from [https://spark.apache.org/downloads.html]
[jira] [Updated] (SPARK-27059) spark-submit on kubernetes cluster does not recognise k8s --master property
: {{C:\windows\system32>kubectl cluster-info }} {{*Kubernetes master is running at https://:* }} *{{KubeDNS is running at https://:/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy}}* Trying to run the SparkPi with the Spark I downloaded from [https://spark.apache.org/downloads.h
[jira] [Updated] (SPARK-27059) spark-submit on kubernetes cluster does not recognise k8s --master property
: {{C:\windows\system32>kubectl cluster-info }} {{*Kubernetes master is running at https://:* }} *{{KubeDNS is running at https://:/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy}}* Trying to run the SparkPi with the Spark I downloaded from [https://spark.apache.org/downloads.h
[jira] [Updated] (SPARK-27059) spark-submit on kubernetes cluster does not recognise k8s --master property
: {{C:\windows\system32>kubectl cluster-info }} {{*Kubernetes master is running at https://:* }} {{ *{{KubeDNS is running at https://:/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy}}*}} Trying to run the SparkPi with the Spark I downloaded from [https://spark.apache.org/downloads.h
[jira] [Updated] (SPARK-27059) spark-submit on kubernetes cluster does not recognise k8s --master property
: {{C:\windows\system32>kubectl cluster-info }} {{*Kubernetes master is running at https://:* }} *{{KubeDNS is running at https://:/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy}}* Trying to run the SparkPi with the Spark release I downloaded from [https://spark.apache.
[jira] [Updated] (SPARK-43036) Spark structuring streaming with kinesis and rocksdb integration
the official documentation - [https://spark.apache.org/docs/latest/streaming-kinesis-integration.html] It does not mention about kinesis connector. Alternative is - [https://github.com/qubole/kinesis-sql] which is not active now. This is now handed over here - [https://github.com/roncemer/spark
RE: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression?
Bui, Tri Cc: user@spark.apache.org Subject: Re: Do I need to applied feature scaling via StandardScaler for LBFGS for Linear Regression? You can do something like the following. val rddVector = input.map({ case (response, vec) => { val newVec = MLUtils.appendBias(vec) newVec.to
[beam-site] 09/11: Fix extraneous p tag and add table borders
-connectors.md index 72983b8..1edbc9a 100644 --- a/src/_posts/2018-08-16-review-input-streaming-connectors.md +++ b/src/_posts/2018-08-16-review-input-streaming-connectors.md @@ -25,7 +25,7 @@ Spark Structured Streaming supports [file sources](https://spark.apache.org/docs Below are the main
RE: Stochastic gradient descent performance
on this? I do understand that in cluster mode the network speed will kick in and then one can blame it. Best regards, Alexander From: Joseph Bradley [mailto:jos...@databricks.com] Sent: Thursday, April 02, 2015 10:51 AM To: Ulanov, Alexander Cc: dev@spark.apache.org Subject: Re: Stochastic
Re: [VOTE] Release Apache Spark 1.3.1 (RC2)
/github.com/apache/spark/pull/5302 >> > >> > [SPARK-6205] [CORE] UISeleniumSuite fails for Hadoop 2.x test with >> > NoClassDefFoundError >> > https://github.com/apache/spark/pull/4933 >>
[jira] [Updated] (SPARK-34179) examples provided in https://spark.apache.org/docs/latest/api/sql/index.html link not working
[ https://issues.apache.org/jira/browse/SPARK-34179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chetan Bhat updated SPARK-34179: Summary: examples provided in https://spark.apache.org/docs/latest/api/sql/index.html link not
[GitHub] [spark] gengliangwang commented on a diff in pull request #40269: [SPARK-42853][DOC] Updating the Style for the Spark Docs based on the Webpage
g +Tuning Guide +Job Scheduling +Security +Hardware Provisioning +Migration Guide + +Building Spark +
Re: .NET on Apache Spark?
To: Ruslan Dautkhanov<mailto:dautkha...@gmail.com>, pedro<mailto:ski.rodrig...@gmail.com> Cc: user@spark.apache.org<mailto:user@spark.apache.org> Unfortunately, afaik that project is long dead. It'd be an interesting project to create an intermediary protocol, perhaps using some
Re: Submitting Jobs Programmatically
p! Thanks & regards Arko On Fri, Feb 19, 2016 at 6:35 PM, Ted Yu wrote: > Please see https://spark.apache.org/docs/latest/spark-standalone.html > > On Fri, Feb 19, 2016 at 6:27 PM, Arko Provo Mukherjee > wrote: >> >> Hi, >> >> Thanks for your response, that
Re: Can not subscript to mailing list
jeff.sadow...@gmail.com" wrote on 10/20/2015 08:48:49 AM: > From: "jeff.sadow...@gmail.com" > To: user@spark.apache.org > Date: 10/20/2015 08:49 AM > Subject: Can not subscript to mailing list > > I am having issues subscribing to the user@spark.apache.org mailing list.
Re: Link existing Hive to Spark
ok.Is there no way to specify it in code, when I create SparkConf ? From: Todd Nist Sent: Friday, February 6, 2015 10:08 PM To: Ashutosh Trivedi (MT2013030) Cc: user@spark.apache.org Subject: Re: Link existing Hive to Spark You can always just add the entry
RE: spark sql performance
Okay Akhil! Thanks for the information. Thanks, Udbhav Agarwal From: Akhil Das [mailto:ak...@sigmoidanalytics.com] Sent: 13 March, 2015 12:34 PM To: Udbhav Agarwal Cc: user@spark.apache.org Subject: Re: spark sql performance Can't say that unless you try it. Thanks Best Regards On Fri, M
Re: problem with HiveContext inside Actor
>> Cc: Du Li mailto:l...@yahoo-inc.com.invalid>>, "user@spark.apache.org<mailto:user@spark.apache.org>" mailto:user@spark.apache.org>> Subject: Re: problem with HiveContext inside Actor - dev Is it possible that you are constructing more than one HiveContext in a sin
Re: [ANNOUNCE] Announcing Apache Spark 2.4.1
Hello, I'm not sure if this is the proper place to report it, but the 2.4.1 version of the config docs apparently didn't render right into HTML (scroll down to "Compression and Serialization") https://spark.apache.org/docs/2.4.1/configuration.html#available-properties By c
Re: Build changes after SPARK-13579
anytime you're building Spark, >>> >> that won't work anymore. >>> >> >>> >> You should now use "sbt package"; you'll still need "sbt assembly" if >>> >> you require one of the remaining assemblies (strea
Re: [PYSPARK] Python tests organization
Is it worth to come up with a proposal for this and float to dev? From: Reynold Xin Sent: Wednesday, January 11, 2017 9:47 AM To: Maciej Szymkiewicz; Saikat Kanjilal; dev@spark.apache.org Subject: Re: [PYSPARK] Python tests organization It would be good to
Re: [VOTE] Release Apache Spark 1.2.1 (RC3)
[ ] +1 Release this package as Apache Spark 1.2.1 > [ ] -1 Do not release this package because ... > > For a list of fixes in this release, see http://s.apache.org/Mpn. > > To learn more about Apache Spark, please see > http://spark.apache.org/ > > ---
Re: [VOTE] Release Apache Spark 1.3.1
[ ] +1 Release this package as Apache Spark 1.3.1 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see > http://spark.apache.org/ > > - Patrick > > ----- > To
Re: spark-shell 1.5 doesn't seem to work in local mode
, 2015 12:14 PM To: dev@spark.apache.org Subject: Re: spark-shell 1.5 doesn't seem to work in local mode Thanks guys. I do have HADOOP_INSTALL set, but Spark 1.4.1 did not seem to mind. Seems like there's a difference in behavior between 1.5.0 and 1.4.1 for some reason. To the best of my kn
Re: Updating docs for running on Mesos
gt; Running Alongside Hadoop > - (trim this down) What trimming do you have in mind here? > > > > Does that work for people? > > > Thanks! > Andrew > > > PS Basically all the same: > > http://spark.apache.org/docs/0.6.0/running-on-mesos.html > http://spar
[jira] [Updated] (SPARK-39198) Cannot refer to nested CTE within a nested CTE in a subquery.
[ https://issues.apache.org/jira/browse/SPARK-39198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jarno Rajala updated SPARK-39198: - Environment: Tested on * Databricks runtime 10.4 * Spark 3.2.1 from [https://spark.apache.org
[jira] [Resolved] (SPARK-25795) Fix CSV SparkR SQL Example
e} > > - > https://github.com/apache/spark/blob/master/examples/src/main/r/RSparkSQLExample.R > - > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc3-docs/_site/sql-programming-guide.html#manually-specifying-options > - > http://spark.apache.org/docs/2.3.2/sql-programm
[jira] [Updated] (SPARK-17794) 2.0.1 not in maven central repo?
Put the following into pom.xml as shown here: https://spark.apache.org/downloads.html {code:java} org.apache.spark spark-core_2.11 2.0.1 {code} Version 2.0.1 does not seem to exist in the Central Repository: https://repo1.maven.org/maven2/org/apache/spark/spark
[jira] [Comment Edited] (SPARK-9059) Update Python Direct Kafka Word count examples to show the use of HasOffsetRanges
9 PM: HasOffsetRanges is explained here http://spark.apache.org/docs/latest/streaming-kafka-integration.html#approach-2-direct-approach-no-receivers There's not an equivalent python code snippet like the java and scala ones. was (Author: c...@koeninger.org): HasOffsetRanges is expla
[jira] [Updated] (SPARK-32095) [DataSource V2] Documentation on SupportsReportStatistics Outdated?
functionality that explicitly wants the operators pushed down [2]. Is the > documentation for SupportsReportStatistics referring to something other than > [2] or should it be updated? > > [[1]https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/sources/v2/reader/SupportsRe
[jira] [Assigned] (SPARK-32095) [DataSource V2] Documentation on SupportsReportStatistics Outdated?
now functionality that explicitly wants the operators pushed down [2]. Is the > documentation for SupportsReportStatistics referring to something other than > [2] or should it be updated? > > [[1]https://spark.apache.org/docs/2.4.6/api/java/org/apache/spark/sql/sources/v2/reader/Suppor
[jira] [Updated] (SPARK-32185) User Guide - Monitoring
also https://github.com/apache/spark/tree/master/python/test_coverage to enable test coverage that include worker sides too. - Sentry Support \(?\) https://blog.sentry.io/2019/11/12/sentry-for-data-error-monitoring-with-pyspark - Link back https://spark.apache.org/docs/latest/monitoring.html
[jira] [Commented] (SPARK-29830) PySpark.context.Sparkcontext.binaryfiles improved memory with buffer
This > means it reads the full binary file immediately into memory, which is 1) > memory in-efficient 2) differs from the Scala implementation (see pyspark > here: > [https://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/context.html#SparkContext.binaryFiles). > > |ht
[jira] [Updated] (SPARK-29830) PySpark.context.Sparkcontext.binaryfiles improved memory with buffer
ient 2) differs from the Scala implementation (see pyspark > here: > [https://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/context.html#SparkContext.binaryFiles). > > |https://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/context.html#SparkContext.binaryFiles] &g
[jira] [Updated] (SPARK-29830) PySpark.context.Sparkcontext.binaryfiles improved memory with buffer
ch is 1) > memory in-efficient 2) differs from the Scala implementation (see pyspark > here: > [https://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/context.html#SparkContext.binaryFiles). > > |https://spark.apache.org/docs/2.4.0/api/python/_modules/pyspark/context.h
[jira] [Updated] (SPARK-32186) User Guide - Debugging
[ https://issues.apache.org/jira/browse/SPARK-32186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-32186: - Description: 1. Python Profiler: https://spark.apache.org/docs/2.3.0/api/python/_modules
[jira] [Created] (SPARK-31907) Spark SQL functions documentation refers to SQL API documentation without linking to it
-31907 Project: Spark Issue Type: Documentation Components: Documentation Affects Versions: 2.4.5 Reporter: Guilherme Beltramini h2. Problem description [The org.apache.spark.sql.functions documentation|http://spark.apache.org/docs/latest/api
[jira] [Updated] (SPARK-7084) Improve the saveAsTable documentation
sounds like it creates a hive table which can be accessed from hive. But it's not the case as discussed here [https://www.mailarchive.com/u...@spark.apache.org/msg26902.html] . This issue is to improve the documentation to reflect the same. (was: The documentation of saveTable is littl
[jira] [Resolved] (SPARK-45939) SPIP: Structured Streaming - Arbitrary State API v2
Affects Versions: 4.0.0 >Reporter: Anish Shrigondekar >Priority: Major > > SPIP: Structured Streaming - Arbitrary State API v2 > > We are planning to introduce a new operator for [Spark Structured > Streaming|https://spark.apache.org/streaming/
[jira] [Updated] (SPARK-45939) SPIP: Structured Streaming - Arbitrary State API v2
Affects Versions: 4.0.0 >Reporter: Anish Shrigondekar >Priority: Major > > SPIP: Structured Streaming - Arbitrary State API v2 > > We are planning to introduce a new operator for [Spark Structured > Streaming|https://spark.apache.org/streaming/
[jira] [Updated] (SPARK-35810) Deprecate ps.broadcast API
and [broadcast|http://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.broadcast.html] function in PySpark as well. was: We have [ps.broadcast|https://koalas.readthedocs.io/en/latest/reference/api/databricks.koalas.broadcast.html] in pandas API on Spark, but
[jira] [Updated] (SPARK-18875) Fix R API doc generation by adding `DESCRIPTION` file
`. This issue aims to fix that. * Official Latest Website: http://spark.apache.org/docs/latest/api/R/index.html * Apache Spark 2.1.0-rc2: http://people.apache.org/~pwendell/spark-releases/spark-2.1.0-rc2-docs/api/R/index.html was: Currently, R API document index page has a broken link on
[jira] [Resolved] (SPARK-15778) Add 2.0.0-preview to dropdown / reorg description of previews at spark.apache.org/downloads.html
[ https://issues.apache.org/jira/browse/SPARK-15778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-15778. --- Resolution: Fixed Fix Version/s: 2.0.0 Live at http://spark.apache.org/downloads.html >
[jira] [Created] (SPARK-4724) JavaNetworkWordCount.java has a wrong import
=dashed|borderColor=#ccc|titleBGColor=#F7D6C1|bgColor=#CE} import org.apache.spark.streaming.Durations; {code} But according to the [documentation|https://spark.apache.org/docs/latest/api/java/org/apache/spark/streaming/package-summary.html], it should be [Duration|https://spark.apache.org/docs