Re: Building Spark

2015-05-13 Thread Emre Sevinc
. Thanks Akhil -- Emre Sevinc

Re: How to deal with code that runs before foreach block in Apache Spark?

2015-05-06 Thread Emre Sevinc
. In practical terms, the lists on the executors are being filled-in but they are never committed and on the driver the opposite is happening. -kr, Gerard On Mon, May 4, 2015 at 3:34 PM, Emre Sevinc emre.sev...@gmail.com wrote: I'm trying to deal with some code that runs differently on Spark stand

How to deal with code that runs before foreach block in Apache Spark?

2015-05-04 Thread Emre Sevinc
I'm trying to deal with some code that runs differently on Spark stand-alone mode and Spark running on a cluster. Basically, for each item in an RDD, I'm trying to add it to a list, and once this is done, I want to send this list to Solr. This works perfectly fine when I run the following code in

Re: Spark Unit Testing

2015-04-21 Thread Emre Sevinc
resource that covers an approach (or approaches) for unit testing using Java. Regards jk -- Emre Sevinc

Re: override log4j.properties

2015-04-09 Thread Emre Sevinc
: Hello, How to override log4j.properties for a specific spark job? BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Emre Sevinc

Re: Query REST web service with Spark?

2015-04-01 Thread Emre Sevinc
, the total number of calls to the service is expected to be low, so it would be ideal to do the whole job in Spark as we scour the data. I don't see anything obvious in the API or on Google relating to making REST calls from a Spark job. Is it possible? Thanks, Alec -- Emre Sevinc

Re: log4j.properties in jar

2015-03-31 Thread Emre Sevinc
, Is it possible to put the log4j.properties in the application jar such that the driver and the executors use this log4j file. Do I need to specify anything while submitting my app so that this file is used? Thanks, Udit -- Emre Sevinc

Re: Why doesn't the --conf parameter work in yarn-cluster mode (but works in yarn-client and local)?

2015-03-24 Thread Emre Sevinc
system properties. -Sandy On Tue, Mar 24, 2015 at 4:25 AM, Emre Sevinc emre.sev...@gmail.com wrote: Hello Sandy, Your suggestion does not work when I try it locally: When I pass --conf key=someValue and then try to retrieve it like: SparkConf sparkConf = new SparkConf

Why doesn't the --conf parameter work in yarn-cluster mode (but works in yarn-client and local)?

2015-03-23 Thread Emre Sevinc
Hello, According to Spark Documentation at https://spark.apache.org/docs/1.2.1/submitting-applications.html : --conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown). And indeed, when I use that parameter, in my

Re: log files of failed task

2015-03-23 Thread Emre Sevinc
...@spark.apache.org -- Emre Sevinc

Re: Writing Spark Streaming Programs

2015-03-19 Thread Emre Sevinc
with is are there any other considerations I need to think about when deciding this? are there any recommendations you can make in regards to this? Regards jk -- Emre Sevinc

Re: Why can't Spark Streaming recover from the checkpoint directory when using a third party library for processingmulti-line JSON?

2015-03-04 Thread Emre Sevinc
still get the same exception. Why doesn't getOrCreate ignore that Hadoop configuration part (which normally works, e.g. when not recovering)? -- Emre On Tue, Mar 3, 2015 at 3:36 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a Spark Streaming application (that uses Spark 1.2.1

Is FileInputDStream returned by fileStream method a reliable receiver?

2015-03-04 Thread Emre Sevinc
Is FileInputDStream returned by fileStream method a reliable receiver? In the Spark Streaming Guide it says: There can be two kinds of data sources based on their *reliability*. Sources (like Kafka and Flume) allow the transferred data to be acknowledged. If the system receiving data from

Re: Why can't Spark Streaming recover from the checkpoint directory when using a third party library for processingmulti-line JSON?

2015-03-04 Thread Emre Sevinc
-submit? Could you give the command you used? TD On Wed, Mar 4, 2015 at 12:42 AM, Emre Sevinc emre.sev...@gmail.com wrote: I've also tried the following: Configuration hadoopConfiguration = new Configuration(); hadoopConfiguration.set(multilinejsoninputformat.member, itemSet

Why can't Spark Streaming recover from the checkpoint directory when using a third party library for processingmulti-line JSON?

2015-03-03 Thread Emre Sevinc
Hello, I have a Spark Streaming application (that uses Spark 1.2.1) that listens to an input directory, and when new JSON files are copied to that directory processes them, and writes them to an output directory. It uses a 3rd party library to process the multi-line JSON files (

Re: Issues reading in Json file with spark sql

2015-03-02 Thread Emre Sevinc
According to Spark SQL Programming Guide: jsonFile - loads data from a directory of JSON files where each line of the files is a JSON object. Note that the file that is offered as jsonFile is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a

Re: Which one is faster / consumes less memory: collect() or count()?

2015-02-26 Thread Emre Sevinc
On Thu, Feb 26, 2015 at 4:20 PM, Sean Owen so...@cloudera.com wrote: Yea we discussed this on the list a short while ago. The extra overhead of count() is pretty minimal. Still you could wrap this up as a utility method. There was even a proposal to add some 'materialize' method to RDD. I

Re: Which one is faster / consumes less memory: collect() or count()?

2015-02-26 Thread Emre Sevinc
at 1:28 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a piece of code to force the materialization of RDDs in my Spark Streaming program, and I'm trying to understand which method is faster and has less memory consumption: javaDStream.foreachRDD(new

Re: Which one is faster / consumes less memory: collect() or count()?

2015-02-26 Thread Emre Sevinc
-knowledge-base/content/best_practices/dont_call_collect_on_a_very_large_rdd.html — FG On Thu, Feb 26, 2015 at 2:28 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a piece of code to force the materialization of RDDs in my Spark Streaming program, and I'm trying to understand

Which one is faster / consumes less memory: collect() or count()?

2015-02-26 Thread Emre Sevinc
Hello, I have a piece of code to force the materialization of RDDs in my Spark Streaming program, and I'm trying to understand which method is faster and has less memory consumption: javaDStream.foreachRDD(new FunctionJavaRDDString, Void() { @Override public Void call(JavaRDDString

Re: Get filename in Spark Streaming

2015-02-24 Thread Emre Sevinc
this into Dstream RDD. val inputStream = ssc.textFileStream(/hdfs Path/) inputStream is Dstreamrdd and in foreachrdd , am doing my processing inputStream.foreachRDD(rdd = { * //how to get filename here??* }) Can you please help. On Thu, Feb 5, 2015 at 11:15 PM, Emre Sevinc emre.sev

Re: Can you add Big Industries to the Powered by Spark page?

2015-02-24 Thread Emre Sevinc
: I've added it, thanks! On Fri, Feb 20, 2015 at 12:22 AM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, Could you please add Big Industries to the Powered by Spark page at https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark ? Company Name: Big Industries URL

Re: Where to look for potential causes for Akka timeout errors in a Spark Streaming Application?

2015-02-23 Thread Emre Sevinc
Nist tsind...@gmail.com wrote: Hi Emre, Have you tried adjusting these: .set(spark.akka.frameSize, 500).set(spark.akka.askTimeout, 30).set(spark.core.connection.ack.wait.timeout, 600) -Todd On Fri, Feb 20, 2015 at 8:14 AM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, We

Re: Streaming Linear Regression

2015-02-20 Thread Emre Sevinc
-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Emre Sevinc

Can you add Big Industries to the Powered by Spark page?

2015-02-20 Thread Emre Sevinc
Hello, Could you please add Big Industries to the Powered by Spark page at https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark ? Company Name: Big Industries URL: http://http://www.bigindustries.be/ Spark Components: Spark Streaming Use Case: Big Content Platform Summary:

Where to look for potential causes for Akka timeout errors in a Spark Streaming Application?

2015-02-20 Thread Emre Sevinc
Hello, We are building a Spark Streaming application that listens to a directory on HDFS, and uses the SolrJ library to send newly detected files to a Solr server. When we put 10.000 files to the directory it is listening to, it starts to process them by sending the files to our Solr server but

Re: Streaming Linear Regression

2015-02-20 Thread Emre Sevinc
...@spark.apache.org -- Emre Sevinc

In a Spark Streaming application, what might be the potential causes for util.AkkaUtils: Error sending message in 1 attempts and java.util.concurrent.TimeoutException: Futures timed out and

2015-02-19 Thread Emre Sevinc
Hello, We have a Spark Streaming application that watches an input directory, and as files are copied there the application reads them and sends the contents to a RESTful web service, receives a response and write some contents to an output directory. When testing the application by copying a

Re: In a Spark Streaming application, what might be the potential causes for util.AkkaUtils: Error sending message in 1 attempts and java.util.concurrent.TimeoutException: Futures timed out and

2015-02-19 Thread Emre Sevinc
On Thu, Feb 19, 2015 at 12:27 PM, Tathagata Das t...@databricks.com wrote: What version of Spark are you using? TD Spark version is 1.2.0 (running on Cloudera CDH 5.3.0) -- Emre Sevinç

Re: Spark Streaming output cannot be used as input?

2015-02-18 Thread Emre Sevinc
7DY, UK. This message has been scanned for malware by Websense. www.websense.com -- Emre Sevinc

Re: Magic number 16: Why doesn't Spark Streaming process more than 16 files?

2015-02-18 Thread Emre Sevinc
.count.println() would be different than just println(), but maybe I am missing something also. Imran On Mon, Feb 16, 2015 at 7:49 AM, Emre Sevinc emre.sev...@gmail.com wrote: Sean, In this case, I've been testing the code on my local machine and using Spark locally, so I all the log output

Re: Re: Problem with 1 master + 2 slaves cluster

2015-02-18 Thread Emre Sevinc
On Wed, Feb 18, 2015 at 10:23 AM, bit1...@163.com bit1...@163.com wrote: Sure, thanks Akhil. A further question : Is local file system(file:///) not supported in standalone cluster? FYI: I'm able to write to local file system (via HDFS API and using file:/// notation) when using Spark. --

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-18 Thread Emre Sevinc
use the override model of the typesafe config: reasonable defaults go in the reference.conf (within the jar). Environment-specific overrides go in the application.conf (alongside the job jar) and hacks are passed with -Dprop=value :-) -kr, Gerard. On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc

[POWERED BY] Can you add Big Industries to the Powered by Spark page?

2015-02-18 Thread Emre Sevinc
Hello, Could you please add Big Industries to the Powered by Spark page at https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark ? Company Name: Big Industries URL: http://http://www.bigindustries.be/ Spark Components: Spark Streaming Use Case: Big Content Platform Summary:

Re: Class loading issue, spark.files.userClassPathFirst doesn't seem to be working

2015-02-18 Thread Emre Sevinc
: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com *Arush Kharbanda* || Technical Teamlead ar...@sigmoidanalytics.com || www.sigmoidanalytics.com -- Emre Sevinc

Re: Class loading issue, spark.files.userClassPathFirst doesn't seem to be working

2015-02-18 Thread Emre Sevinc
On Wed, Feb 18, 2015 at 4:54 PM, Dmitry Goldenberg dgoldenberg...@gmail.com wrote: Thank you, Emre. It seems solrj still depends on HttpClient 4.1.3; would that not collide with Spark/Hadoop's default dependency on HttpClient set to 4.2.6? If that's the case that might just solve the problem.

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-17 Thread Emre Sevinc
/ On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc emre.sev...@gmail.com wrote: Hello, I'm using Spark 1.2.1 and have a module.properties file, and in it I have non-Spark properties, as well as Spark properties, e.g.: job.output.dir=file:///home/emre/data/mymodule/out I'm trying to pass

Re: Magic number 16: Why doesn't Spark Streaming process more than 16 files?

2015-02-16 Thread Emre Sevinc
of stopping at 16. -- Emre On Mon, Feb 16, 2015 at 12:56 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have an application in Java that uses Spark Streaming 1.2.1 in the following manner: - Listen to the input directory. - If a new file is copied to that input directory process

Re: Magic number 16: Why doesn't Spark Streaming process more than 16 files?

2015-02-16 Thread Emre Sevinc
of that code tells you only 16 files were processed. On Feb 16, 2015 1:18 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello Sean, I did not understand your question very well, but what I do is checking the output directory (and I have various logger outputs at various stages showing the contents

Re: Magic number 16: Why doesn't Spark Streaming process more than 16 files?

2015-02-16 Thread Emre Sevinc
...@cloudera.com wrote: How are you deciding whether files are processed or not? It doesn't seem possible from this code. Maybe it just seems so. On Feb 16, 2015 12:51 PM, Emre Sevinc emre.sev...@gmail.com wrote: I've managed to solve this, but I still don't know exactly why my solution works

Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Emre Sevinc
Hello, I'm using Spark 1.2.1 and have a module.properties file, and in it I have non-Spark properties, as well as Spark properties, e.g.: job.output.dir=file:///home/emre/data/mymodule/out I'm trying to pass it to spark-submit via: spark-submit --class com.myModule --master local[4]

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Emre Sevinc
mechanism you like to retain your own properties. On Mon, Feb 16, 2015 at 3:26 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I'm using Spark 1.2.1 and have a module.properties file, and in it I have non-Spark properties, as well as Spark properties, e.g.: job.output.dir=file

Magic number 16: Why doesn't Spark Streaming process more than 16 files?

2015-02-16 Thread Emre Sevinc
Hello, I have an application in Java that uses Spark Streaming 1.2.1 in the following manner: - Listen to the input directory. - If a new file is copied to that input directory process it. - Process: contact a RESTful web service (running also locally and responsive), send the contents of the

Documentation error in MLlib - Clustering?

2015-02-13 Thread Emre Sevinc
Hello, I was trying the streaming kmeans clustering example in the official documentation at: http://spark.apache.org/docs/1.2.0/mllib-clustering.html But I've got a type error when I tried to compile the code: [error] found :

Re: How to log using log4j to local file system inside a Spark application that runs on YARN?

2015-02-12 Thread Emre Sevinc
at 4:29 AM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I'm building an Apache Spark Streaming application and cannot make it log to a file on the local filesystem when running it on YARN. How can achieve this? I've set log4.properties file so that it can successfully write to a log

Re: Get filename in Spark Streaming

2015-02-05 Thread Emre Sevinc
...@gmail.com wrote: Hi All, We have filename with timestamp say ABC_1421893256000.txt and the timestamp needs to be extracted from file name for further processing.Is there a way to get input file name picked up by spark streaming job? Thanks in advance Subacini -- Emre Sevinc

Re: How to define a file filter for file name patterns in Apache Spark Streaming in Java?

2015-02-03 Thread Emre Sevinc
2, 2015 at 6:34 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I'm using Apache Spark Streaming 1.2.0 and trying to define a file filter for file names when creating an InputDStream https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/streaming/dstream/InputDStream.html

How to define a file filter for file name patterns in Apache Spark Streaming in Java?

2015-02-02 Thread Emre Sevinc
Hello, I'm using Apache Spark Streaming 1.2.0 and trying to define a file filter for file names when creating an InputDStream https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/streaming/dstream/InputDStream.html by invoking the fileStream

Re: Spark streaming - tracking/deleting processed files

2015-02-02 Thread Emre Sevinc
...@spark.apache.org -- Emre Sevinc

Re: Exception when using HttpSolrServer (httpclient) from within Spark Streaming: java.lang.NoSuchMethodError: org.apache.http.impl.conn.SchemeRegistryFactory.createSystemDefault()Lorg/apache/http/con

2015-01-29 Thread Emre Sevinc
/questions/4716310/is-there-a-way-to-exclude-a-maven-dependency-globally (I don't know if a provided dependency will work without a specific version number so I'm just making a guess here.) On Wed Jan 28 2015 at 11:24:02 AM Emre Sevinc emre.sev...@gmail.com wrote: When I examine the dependencies

Re: Exception when using HttpSolrServer (httpclient) from within Spark Streaming: java.lang.NoSuchMethodError: org.apache.http.impl.conn.SchemeRegistryFactory.createSystemDefault()Lorg/apache/http/con

2015-01-28 Thread Emre Sevinc
with the Maven shade plugin. On Wed Jan 28 2015 at 8:00:22 AM Emre Sevinc emre.sev...@gmail.com wrote: Hello, I'm using *Spark 1.1.0* and *Solr 4.10.3*. I'm getting an exception when using *HttpSolrServer* from within Spark Streaming: 15/01/28 13:42:52 ERROR Executor: Exception in task 0.0 in stage

Exception when using HttpSolrServer (httpclient) from within Spark Streaming: java.lang.NoSuchMethodError: org.apache.http.impl.conn.SchemeRegistryFactory.createSystemDefault()Lorg/apache/http/conn/sc

2015-01-28 Thread Emre Sevinc
Hello, I'm using *Spark 1.1.0* and *Solr 4.10.3*. I'm getting an exception when using *HttpSolrServer* from within Spark Streaming: 15/01/28 13:42:52 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NoSuchMethodError:

Re: Exception when using HttpSolrServer (httpclient) from within Spark Streaming: java.lang.NoSuchMethodError: org.apache.http.impl.conn.SchemeRegistryFactory.createSystemDefault()Lorg/apache/http/con

2015-01-28 Thread Emre Sevinc
in version 4.3.1 of httpcomponents so even if by chance one of them did rely on commons-httpclient there wouldn't be a class conflict. On Wed Jan 28 2015 at 9:19:20 AM Emre Sevinc emre.sev...@gmail.com wrote: This is what I get: ./bigcontent-1.0-SNAPSHOT.jar:org/apache/http/impl/conn

Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
Hello, I have a piece of code that runs inside Spark Streaming and tries to get some data from a RESTful web service (that runs locally on my machine). The code snippet in question is: Client client = ClientBuilder.newClient(); WebTarget target =

Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote: I'd take a look with 'mvn dependency:tree' on your own code first. Maybe you are including JavaEE 6 for example? For reference, my complete pom.xml looks like: project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=

Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
, which in turn only appear in examples, so that's unlikely to be it. I'd take a look with 'mvn dependency:tree' on your own code first. Maybe you are including JavaEE 6 for example? On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com wrote: Hello, I have a piece of code

Re: Why does consuming a RESTful web service (using javax.ws.rs.* and Jsersey) work in unit test but not when submitted to Spark?

2014-12-24 Thread Emre Sevinc
, Emre Sevinc emre.sev...@gmail.com wrote: It seems like YARN depends an older version of Jersey, that is 1.9: https://github.com/apache/spark/blob/master/yarn/pom.xml When I've modified my dependencies to have only: dependency groupIdcom.sun.jersey/groupId

Re: Unit testing and Spark Streaming

2014-12-12 Thread Emre Sevinc
On Fri, Dec 12, 2014 at 2:17 PM, Eric Loots eric.lo...@gmail.com wrote: How can the log level in test mode be reduced (or extended when needed) ? Hello Eric, The following might be helpful for reducing the log messages during unit testing: http://stackoverflow.com/a/2736/236007 -- Emre

How can I make Spark Streaming count the words in a file in a unit test?

2014-12-08 Thread Emre Sevinc
Hello, I've successfully built a very simple Spark Streaming application in Java that is based on the HdfsCount example in Scala at https://github.com/apache/spark/blob/branch-1.1/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala . When I submit this application to

How can I compile only the core and streaming (so that I can get test utilities of streaming)?

2014-12-05 Thread Emre Sevinc
Hello, I'm currently developing a Spark Streaming application and trying to write my first unit test. I've used Java for this application, and I also need use Java (and JUnit) for writing unit tests. I could not find any documentation that focuses on Spark Streaming unit testing, all I could

Re: How can I compile only the core and streaming (so that I can get test utilities of streaming)?

2014-12-05 Thread Emre Sevinc
Hello, Specifying '-DskipTests' on commandline worked, though I can't be sure whether first running 'sbt assembly' also contributed to the solution. (I've tried 'sbt assembly' because branch-1.1's README says to use sbt). Thanks for the answer. Kind regards, Emre Sevinç