date:20170620

Spark 2.1.1 and Hadoop version 2.2 or 2.7?

2017-06-20 Thread N B

I had downloaded the pre build package labeled "Spark 2.1.1 prebuilt with Hadoop 2.7 or later" from the direct download link on spark.apache.org. However, I am seeing compatibility errors running against a deployed HDFS 2.7.3. (See my earlier message about Flume DStream producing 0 records after

RE: Using Spark as a simulator

2017-06-20 Thread Mahesh Sawaiker

I have already seen on example where data is generated using spark, no reason to think it's a bad idea as far as I know. You can check the code here, I m not very sure but I think there is something there which generates data for the TPCDS benchmark and you can provide how much data you want in

Unsubscribe

2017-06-20 Thread Palash Gupta

Unsubscribe Thanks & Best Regards, Engr. Palash Gupta Consultant, OSS/CEM/Big Data Skype: palash2494 https://www.linkedin.com/in/enggpalashgupta

Re: appendix

2017-06-20 Thread Wenchen Fan

you should make hbase a data source(seems we already have hbase connector?), create a dataframe from hbase, and do join in Spark SQL. > On 21 Jun 2017, at 10:17 AM, sunerhan1...@sina.com wrote: > > Hello, > My scenary is like this: > 1.val df=hivecontext/carboncontex.sql("sql") >

Re: org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

2017-06-20 Thread Jean Georges Perrin

After investigation, it looks like my Spark 2.1.1 jars got corrupted during download - all good now... ;) > On Jun 20, 2017, at 4:14 PM, Jean Georges Perrin wrote: > > Hey all, > > i was giving a run to 2.1.1 and got an error on one of my test program: > > package

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-20 Thread N B

Ok some more info about this issue to see if someone can shine a light on what could be going on. I turned on debug logging for org.apache.spark.streaming.scheduler in the driver process and this is what gets thrown in the logs and keeps throwing it even after the downed HDFS node is restarted.

Re: Bizzare diff in behavior between scala REPL and sparkSQL UDF

2017-06-20 Thread jeff saremi

never mind! I has a space at the end of my data which was not showing up in manual testing. thanks From: jeff saremi Sent: Tuesday, June 20, 2017 2:48:06 PM To: user@spark.apache.org Subject: Bizzare diff in behavior between scala REPL

Bizzare diff in behavior between scala REPL and sparkSQL UDF

2017-06-20 Thread jeff saremi

I have this function which does a regex matching in scala. I test it in the REPL I get expected results. I use it as a UDF in sparkSQL i get completely incorrect results. Function: class UrlFilter (filters: Seq[String]) extends Serializable { val regexFilters = filters.map(new Regex(_))

Re: org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

2017-06-20 Thread Michael Armbrust

It's in the spark-catalyst_2.11-2.1.1.jar since the logical query plans and optimization also need to know about types. On Tue, Jun 20, 2017 at 1:14 PM, Jean Georges Perrin wrote: > Hey all, > > i was giving a run to 2.1.1 and got an error on one of my test program: > > package

How to bootstrap Spark Kafka direct with the previous state in case of a code upgrade

2017-06-20 Thread SRK

Hi, How do we bootstrap the streaming job with the previous state when we do a code change and redeploy? We use updateStateByKey to maintain the state and store session objects and LinkedHashMaps in the checkpoint. Thanks, Swetha -- View this message in context:

Re: "Sharing" dataframes...

2017-06-20 Thread Jean Georges Perrin

Thanks Vadim & Jörn... I will look into those. jg > On Jun 20, 2017, at 2:12 PM, Vadim Semenov > wrote: > > You can launch one permanent spark context and then execute your jobs within > the context. And since they'll be running in the same context, they can

org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

2017-06-20 Thread Jean Georges Perrin

Hey all, i was giving a run to 2.1.1 and got an error on one of my test program: package net.jgp.labs.spark.l000_ingestion; import java.util.Arrays; import java.util.List; import org.apache.spark.sql.Dataset; import org.apache.spark.sql.Row; import org.apache.spark.sql.SparkSession; import

Re: "Sharing" dataframes...

2017-06-20 Thread Vadim Semenov

You can launch one permanent spark context and then execute your jobs within the context. And since they'll be running in the same context, they can share data easily. These two projects provide the functionality that you need:

Re: "Sharing" dataframes...

2017-06-20 Thread Jörn Franke

You could all express it in one program, alternatively ignite in memory file system or the ignite sharedrdd ( not sure if dataframe is supported) > On 20. Jun 2017, at 19:46, Jean Georges Perrin wrote: > > Hey, > > Here is my need: program A does something on a set of data and

Re: Merging multiple Pandas dataframes

2017-06-20 Thread Saatvik Shah

Hi Assaf, Thanks for the suggestion on checkpointing - I'll need to read up more on that. My current implementation seems to be crashing with a GC memory limit exceeded error if Im keeping multiple persist calls for a large number of files. Thus, I was also thinking about the constant calls to

"Sharing" dataframes...

2017-06-20 Thread Jean Georges Perrin

Hey, Here is my need: program A does something on a set of data and produces results, program B does that on another set, and finally, program C combines the data of A and B. Of course, the easy way is to dump all on disk after A and B are done, but I wanted to avoid this. I was thinking of

Re: Flume DStream produces 0 records after HDFS node killed

2017-06-20 Thread N B

BTW, this is running on Spark 2.1.1. I have been trying to debug this issue and what I have found till now is that it is somehow related to the Spark WAL. The directory named /receivedBlockMetadata seems to stop getting written to after the point of an HDFS node being killed and restarted. I have

Unsubscribe

2017-06-20 Thread Anita Tailor

Unsubscribe Sent from my iPhone

Re: Do we anything for Deep Learning in Spark?

2017-06-20 Thread Jules Damji

And we will having a webinar on July 27 going into some more details. Stay tuned. Cheers Jules Sent from my iPhone Pardon the dumb thumb typos :) > On Jun 20, 2017, at 7:00 AM, Michael Mior wrote: > > It's still in the early stages, but check out Deep Learning

Re: Using Spark as a simulator

2017-06-20 Thread Jörn Franke

It is fine, but you have to design it that generated rows are written in large blocks for optimal performance. The most tricky part with data generation is the conceptual part, such as probabilistic distribution etc You have to check as well that you use a good random generator, for some cases

Using Spark as a simulator

2017-06-20 Thread Esa Heikkinen

Hi Spark is a data analyzer, but would it be possible to use Spark as a data generator or simulator ? My simulation can be very huge and i think a parallelized simulation using by Spark (cloud) could work. Is that good or bad idea ? Regards Esa Heikkinen

spark higher order functions

2017-06-20 Thread AssafMendelson

Hi, I have seen that databricks have higher order functions (https://docs.databricks.com/_static/notebooks/higher-order-functions.html, https://databricks.com/blog/2017/05/24/working-with-nested-data-using-higher-order-functions-in-sql-on-databricks.html) which basically allows to do generic

Re: Do we anything for Deep Learning in Spark?

2017-06-20 Thread Michael Mior

It's still in the early stages, but check out Deep Learning Pipelines from Databricks https://github.com/databricks/spark-deep-learning -- Michael Mior mm...@apache.org 2017-06-20 0:36 GMT-04:00 Gaurav1809 : > Hi All, > > Similar to how we have machine learning library

Re: Cassandra querying time stamps

2017-06-20 Thread sujeet jog

Correction. On Tue, Jun 20, 2017 at 5:27 PM, sujeet jog wrote: > , Below is the query, looks like from physical plan, the query is same as > that of cqlsh, > > val query = s"""(select * from model_data > where TimeStamp > \'$timeStamp+\' and TimeStamp <= >

Re: Cassandra querying time stamps

2017-06-20 Thread sujeet jog

, Below is the query, looks like from physical plan, the query is same as that of cqlsh, val query = s"""(select * from model_data where TimeStamp > \'$timeStamp+\' and TimeStamp <= \'$startTS+\' and MetricID = $metricID)""" println("Model query" + query) val df

Re: Cassandra querying time stamps

2017-06-20 Thread Riccardo Ferrari

Hi, Personally I would inspect how dates are managed. How does your spark code looks like? What does the explain say. Does TimeStamp gets parsed the same way? Best, On Tue, Jun 20, 2017 at 12:52 PM, sujeet jog wrote: > Hello, > > I have a table as below > > CREATE TABLE

Cassandra querying time stamps

2017-06-20 Thread sujeet jog

Hello, I have a table as below CREATE TABLE analytics_db.ml_forecast_tbl ( "MetricID" int, "TimeStamp" timestamp, "ResourceID" timeuuid "Value" double, PRIMARY KEY ("MetricID", "TimeStamp", "ResourceID") ) select * from ml_forecast_tbl where "MetricID" = 1 and "TimeStamp" >

Unsubscribe

2017-06-20 Thread praba karan

Unsubscribe Sent from Yahoo Mail on Android

Re: Spark Streaming - Increasing number of executors slows down processing rate

2017-06-20 Thread Biplob Biswas

Hi Edwin, I have faced a similar issue as well and this behaviour is very abrupt. I even created a question on StackOverflow but no solution yet. https://stackoverflow.com/questions/43496205/spark-job-processing-time-increases-to-4s-without-explanation For us, we sometimes had this constant

spark2.1 and kafka0.10

2017-06-20 Thread lk_spark

hi,all : https://issues.apache.org/jira/browse/SPARK-19680 is this issue have any method to patch it ? I met the same problem. 2017-06-20 lk_spark

RE: Merging multiple Pandas dataframes

2017-06-20 Thread Mendelson, Assaf

Note that depending on the number of iterations, the query plan for the dataframe can become long and this can cause slowdowns (or even crashes). A possible solution would be to checkpoint (or simply save and reload the dataframe) every once in a while. When reloading from disk, the newly loaded

Spark 2.1.1 and Hadoop version 2.2 or 2.7?

RE: Using Spark as a simulator

Unsubscribe

Re: appendix

Re: org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

Re: Flume DStream produces 0 records after HDFS node killed

Re: Bizzare diff in behavior between scala REPL and sparkSQL UDF

Bizzare diff in behavior between scala REPL and sparkSQL UDF

Re: org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

How to bootstrap Spark Kafka direct with the previous state in case of a code upgrade

Re: "Sharing" dataframes...

org.apache.spark.sql.types missing from spark-sql_2.11-2.1.1.jar?

Re: "Sharing" dataframes...

Re: "Sharing" dataframes...

Re: Merging multiple Pandas dataframes

"Sharing" dataframes...

Re: Flume DStream produces 0 records after HDFS node killed

Unsubscribe

Re: Do we anything for Deep Learning in Spark?

Re: Using Spark as a simulator

Using Spark as a simulator

spark higher order functions

Re: Do we anything for Deep Learning in Spark?

Re: Cassandra querying time stamps

Re: Cassandra querying time stamps

Re: Cassandra querying time stamps

Cassandra querying time stamps

Unsubscribe

Re: Spark Streaming - Increasing number of executors slows down processing rate

spark2.1 and kafka0.10

RE: Merging multiple Pandas dataframes

31 matches

Site Navigation

Mail list logo

Footer information