Re: Spark 2.0: Rearchitecting Spark for Mobile, Local, Social

2015-04-01 Thread Kushal Datta
Reynold, what's the idea behind using LLVM? On Wed, Apr 1, 2015 at 12:31 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Nice try :) Thanks Best Regards On Wed, Apr 1, 2015 at 12:41 PM, Reynold Xin r...@databricks.com wrote: Hi Spark devs, I've spent the last few months

Migrating from 1.2.1 to 1.3.0 - org.apache.spark.sql.api.java.Row

2015-04-01 Thread Niranda Perera
Hi, previously in 1.2.1, the result row from a Spark SQL query was a org.apache.spark.sql.api.java.Row. In 1.3.0 I do not see a sql.api.java package. so does it mean that even the SQL query result row is an implementation of org.apache.spark.sql.Row such as GenericRow etc? -- Niranda

Re: Migrating from 1.2.1 to 1.3.0 - org.apache.spark.sql.api.java.Row

2015-04-01 Thread Reynold Xin
Yup - we merged the Java and Scala API so there is now a single set of API to support both languages. See more at http://spark.apache.org/docs/latest/sql-programming-guide.html#unification-of-the-java-and-scala-apis On Tue, Mar 31, 2015 at 11:40 PM, Niranda Perera niranda.per...@gmail.com

Re: Spark 2.0: Rearchitecting Spark for Mobile, Local, Social

2015-04-01 Thread Tathagata Das
This is a significant effort that Reynold has undertaken, and I am super glad to see that it's finally taking a concrete form. Would love to see what the community thinks about the idea. TD On Wed, Apr 1, 2015 at 3:11 AM, Reynold Xin r...@databricks.com wrote: Hi Spark devs, I've spent the

Re: One corrupt gzip in a directory of 100s

2015-04-01 Thread Ted Yu
bq. writing the output (to Amazon S3) failed What's the value of fs.s3.maxRetries ? Increasing the value should help. Cheers On Wed, Apr 1, 2015 at 8:34 AM, Romi Kuntsman r...@totango.com wrote: What about communication errors and not corrupted files? Both when reading input and when writing

Re: One corrupt gzip in a directory of 100s

2015-04-01 Thread Romi Kuntsman
What about communication errors and not corrupted files? Both when reading input and when writing output. We currently experience a failure of the entire process, if the last stage of writing the output (to Amazon S3) failed because of a very temporary DNS resolution issue (easily resolved by

RE: Can I call aggregate UDF in DataFrame?

2015-04-01 Thread Haopu Wang
Great! Thank you! From: Reynold Xin [mailto:r...@databricks.com] Sent: Thursday, April 02, 2015 8:11 AM To: Haopu Wang Cc: user; dev@spark.apache.org Subject: Re: Can I call aggregate UDF in DataFrame? You totally can.

Re: Unit test logs in Jenkins?

2015-04-01 Thread Patrick Wendell
Hey Marcelo, Great question. Right now, some of the more active developers have an account that allows them to log into this cluster to inspect logs (we copy the logs from each run to a node on that cluster). The infrastructure is maintained by the AMPLab. I will put you in touch the someone

Re: Storing large data for MLlib machine learning

2015-04-01 Thread Hector Yee
I use Thrift and then base64 encode the binary and save it as text file lines that are snappy or gzip encoded. It makes it very easy to copy small chunks locally and play with subsets of the data and not have dependencies on HDFS / hadoop for server stuff for example. On Thu, Mar 26, 2015 at

RE: Stochastic gradient descent performance

2015-04-01 Thread Ulanov, Alexander
Sorry for bothering you again, but I think that it is an important issue for applicability of SGD in Spark MLlib. Could Spark developers please comment on it. -Original Message- From: Ulanov, Alexander Sent: Monday, March 30, 2015 5:00 PM To: dev@spark.apache.org Subject: Stochastic

RE: Storing large data for MLlib machine learning

2015-04-01 Thread Ulanov, Alexander
Thanks, sounds interesting! How do you load files to Spark? Did you consider having multiple files instead of file lines? From: Hector Yee [mailto:hector@gmail.com] Sent: Wednesday, April 01, 2015 11:36 AM To: Ulanov, Alexander Cc: Evan R. Sparks; Stephen Boesch; dev@spark.apache.org

Re: Spark 2.0: Rearchitecting Spark for Mobile, Local, Social

2015-04-01 Thread Burak Yavuz
This is awesome! I can write the apps for it, to make the Web UI more functional! On Wed, Apr 1, 2015 at 12:37 AM, Tathagata Das tathagata.das1...@gmail.com wrote: This is a significant effort that Reynold has undertaken, and I am super glad to see that it's finally taking a concrete form.

RE: Storing large data for MLlib machine learning

2015-04-01 Thread Ulanov, Alexander
Jeremy, thanks for explanation! What if instead you've used Parquet file format? You can still write a number of small files as you do, but you don't have to implement a writer/reader, because they are available for Parquet in various languages. From: Jeremy Freeman