.
Thanks
Akhil
--
Emre Sevinc
.
In practical terms, the lists on the executors are being filled-in but
they are never committed and on the driver the opposite is happening.
-kr, Gerard
On Mon, May 4, 2015 at 3:34 PM, Emre Sevinc emre.sev...@gmail.com
wrote:
I'm trying to deal with some code that runs differently on Spark
stand
I'm trying to deal with some code that runs differently on Spark
stand-alone mode and Spark running on a cluster. Basically, for each item
in an RDD, I'm trying to add it to a list, and once this is done, I want to
send this list to Solr.
This works perfectly fine when I run the following code in
resource that covers an approach (or approaches) for
unit testing using Java.
Regards
jk
--
Emre Sevinc
:
Hello,
How to override log4j.properties for a specific spark job?
BR,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Emre Sevinc
, the total number of calls to the service is expected to be
low, so it would be ideal to do the whole job in Spark as we scour the data.
I don't see anything obvious in the API or on Google relating to making
REST calls from a Spark job. Is it possible?
Thanks,
Alec
--
Emre Sevinc
,
Is it possible to put the log4j.properties in the application jar such
that the driver and the executors use this log4j file. Do I need to specify
anything while submitting my app so that this file is used?
Thanks,
Udit
--
Emre Sevinc
system properties.
-Sandy
On Tue, Mar 24, 2015 at 4:25 AM, Emre Sevinc emre.sev...@gmail.com
wrote:
Hello Sandy,
Your suggestion does not work when I try it locally:
When I pass
--conf key=someValue
and then try to retrieve it like:
SparkConf sparkConf = new SparkConf
Hello,
According to Spark Documentation at
https://spark.apache.org/docs/1.2.1/submitting-applications.html :
--conf: Arbitrary Spark configuration property in key=value format. For
values that contain spaces wrap “key=value” in quotes (as shown).
And indeed, when I use that parameter, in my
...@spark.apache.org
--
Emre Sevinc
with is are there any other
considerations I need to think about when deciding this? are there any
recommendations you can make in regards to this?
Regards
jk
--
Emre Sevinc
still get the same exception.
Why doesn't getOrCreate ignore that Hadoop configuration part (which
normally works, e.g. when not recovering)?
--
Emre
On Tue, Mar 3, 2015 at 3:36 PM, Emre Sevinc emre.sev...@gmail.com wrote:
Hello,
I have a Spark Streaming application (that uses Spark 1.2.1
Is FileInputDStream returned by fileStream method a reliable receiver?
In the Spark Streaming Guide it says:
There can be two kinds of data sources based on their *reliability*.
Sources (like Kafka and Flume) allow the transferred data to be
acknowledged. If the system receiving data from
-submit? Could you give the
command you used?
TD
On Wed, Mar 4, 2015 at 12:42 AM, Emre Sevinc emre.sev...@gmail.com
wrote:
I've also tried the following:
Configuration hadoopConfiguration = new Configuration();
hadoopConfiguration.set(multilinejsoninputformat.member, itemSet
Hello,
I have a Spark Streaming application (that uses Spark 1.2.1) that listens
to an input directory, and when new JSON files are copied to that directory
processes them, and writes them to an output directory.
It uses a 3rd party library to process the multi-line JSON files (
According to Spark SQL Programming Guide:
jsonFile - loads data from a directory of JSON files where each line of the
files is a JSON object.
Note that the file that is offered as jsonFile is not a typical JSON file.
Each line must contain a separate, self-contained valid JSON object. As a
On Thu, Feb 26, 2015 at 4:20 PM, Sean Owen so...@cloudera.com wrote:
Yea we discussed this on the list a short while ago. The extra
overhead of count() is pretty minimal. Still you could wrap this up as
a utility method. There was even a proposal to add some 'materialize'
method to RDD.
I
at 1:28 PM, Emre Sevinc emre.sev...@gmail.com
wrote:
Hello,
I have a piece of code to force the materialization of RDDs in my Spark
Streaming program, and I'm trying to understand which method is faster
and
has less memory consumption:
javaDStream.foreachRDD(new
-knowledge-base/content/best_practices/dont_call_collect_on_a_very_large_rdd.html
—
FG
On Thu, Feb 26, 2015 at 2:28 PM, Emre Sevinc emre.sev...@gmail.com
wrote:
Hello,
I have a piece of code to force the materialization of RDDs in my Spark
Streaming program, and I'm trying to understand
Hello,
I have a piece of code to force the materialization of RDDs in my Spark
Streaming program, and I'm trying to understand which method is faster and
has less memory consumption:
javaDStream.foreachRDD(new FunctionJavaRDDString, Void() {
@Override
public Void call(JavaRDDString
this into Dstream RDD.
val inputStream = ssc.textFileStream(/hdfs Path/)
inputStream is Dstreamrdd and in foreachrdd , am doing my processing
inputStream.foreachRDD(rdd = {
* //how to get filename here??*
})
Can you please help.
On Thu, Feb 5, 2015 at 11:15 PM, Emre Sevinc emre.sev
:
I've added it, thanks!
On Fri, Feb 20, 2015 at 12:22 AM, Emre Sevinc emre.sev...@gmail.com
wrote:
Hello,
Could you please add Big Industries to the Powered by Spark page at
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark ?
Company Name: Big Industries
URL
Nist tsind...@gmail.com wrote:
Hi Emre,
Have you tried adjusting these:
.set(spark.akka.frameSize, 500).set(spark.akka.askTimeout,
30).set(spark.core.connection.ack.wait.timeout, 600)
-Todd
On Fri, Feb 20, 2015 at 8:14 AM, Emre Sevinc emre.sev...@gmail.com
wrote:
Hello,
We
-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Emre Sevinc
Hello,
Could you please add Big Industries to the Powered by Spark page at
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark ?
Company Name: Big Industries
URL: http://http://www.bigindustries.be/
Spark Components: Spark Streaming
Use Case: Big Content Platform
Summary:
Hello,
We are building a Spark Streaming application that listens to a directory
on HDFS, and uses the SolrJ library to send newly detected files to a Solr
server. When we put 10.000 files to the directory it is listening to, it
starts to process them by sending the files to our Solr server but
...@spark.apache.org
--
Emre Sevinc
Hello,
We have a Spark Streaming application that watches an input directory, and
as files are copied there the application reads them and sends the contents
to a RESTful web service, receives a response and write some contents to an
output directory.
When testing the application by copying a
On Thu, Feb 19, 2015 at 12:27 PM, Tathagata Das t...@databricks.com wrote:
What version of Spark are you using?
TD
Spark version is 1.2.0 (running on Cloudera CDH 5.3.0)
--
Emre Sevinç
7DY, UK.
This message has been scanned for malware by Websense. www.websense.com
--
Emre Sevinc
.count.println() would be different than just println(),
but maybe I am missing something also.
Imran
On Mon, Feb 16, 2015 at 7:49 AM, Emre Sevinc emre.sev...@gmail.com
wrote:
Sean,
In this case, I've been testing the code on my local machine and using
Spark locally, so I all the log output
On Wed, Feb 18, 2015 at 10:23 AM, bit1...@163.com bit1...@163.com wrote:
Sure, thanks Akhil.
A further question : Is local file system(file:///) not supported in
standalone cluster?
FYI: I'm able to write to local file system (via HDFS API and using
file:/// notation) when using Spark.
--
use the override model of the typesafe config: reasonable defaults go
in the reference.conf (within the jar). Environment-specific overrides go
in the application.conf (alongside the job jar) and hacks are passed with
-Dprop=value :-)
-kr, Gerard.
On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc
Hello,
Could you please add Big Industries to the Powered by Spark page at
https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark ?
Company Name: Big Industries
URL: http://http://www.bigindustries.be/
Spark Components: Spark Streaming
Use Case: Big Content Platform
Summary:
: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com
*Arush Kharbanda* || Technical Teamlead
ar...@sigmoidanalytics.com || www.sigmoidanalytics.com
--
Emre Sevinc
On Wed, Feb 18, 2015 at 4:54 PM, Dmitry Goldenberg dgoldenberg...@gmail.com
wrote:
Thank you, Emre. It seems solrj still depends on HttpClient 4.1.3; would
that not collide with Spark/Hadoop's default dependency on HttpClient set
to 4.2.6? If that's the case that might just solve the problem.
/
On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc emre.sev...@gmail.com
wrote:
Hello,
I'm using Spark 1.2.1 and have a module.properties file, and in it I have
non-Spark properties, as well as Spark properties, e.g.:
job.output.dir=file:///home/emre/data/mymodule/out
I'm trying to pass
of stopping at 16.
--
Emre
On Mon, Feb 16, 2015 at 12:56 PM, Emre Sevinc emre.sev...@gmail.com wrote:
Hello,
I have an application in Java that uses Spark Streaming 1.2.1 in the
following manner:
- Listen to the input directory.
- If a new file is copied to that input directory process
of that code tells you only 16 files were processed.
On Feb 16, 2015 1:18 PM, Emre Sevinc emre.sev...@gmail.com wrote:
Hello Sean,
I did not understand your question very well, but what I do is checking
the output directory (and I have various logger outputs at various stages
showing the contents
...@cloudera.com wrote:
How are you deciding whether files are processed or not? It doesn't seem
possible from this code. Maybe it just seems so.
On Feb 16, 2015 12:51 PM, Emre Sevinc emre.sev...@gmail.com wrote:
I've managed to solve this, but I still don't know exactly why my
solution works
Hello,
I'm using Spark 1.2.1 and have a module.properties file, and in it I have
non-Spark properties, as well as Spark properties, e.g.:
job.output.dir=file:///home/emre/data/mymodule/out
I'm trying to pass it to spark-submit via:
spark-submit --class com.myModule --master local[4]
mechanism you like to retain your own properties.
On Mon, Feb 16, 2015 at 3:26 PM, Emre Sevinc emre.sev...@gmail.com
wrote:
Hello,
I'm using Spark 1.2.1 and have a module.properties file, and in it I have
non-Spark properties, as well as Spark properties, e.g.:
job.output.dir=file
Hello,
I have an application in Java that uses Spark Streaming 1.2.1 in the
following manner:
- Listen to the input directory.
- If a new file is copied to that input directory process it.
- Process: contact a RESTful web service (running also locally and
responsive), send the contents of the
Hello,
I was trying the streaming kmeans clustering example in the official
documentation at:
http://spark.apache.org/docs/1.2.0/mllib-clustering.html
But I've got a type error when I tried to compile the code:
[error] found :
at 4:29 AM, Emre Sevinc emre.sev...@gmail.com
wrote:
Hello,
I'm building an Apache Spark Streaming application and cannot make it
log to
a file on the local filesystem when running it on YARN. How can achieve
this?
I've set log4.properties file so that it can successfully write to a log
...@gmail.com wrote:
Hi All,
We have filename with timestamp say ABC_1421893256000.txt and the
timestamp needs to be extracted from file name for further processing.Is
there a way to get input file name picked up by spark streaming job?
Thanks in advance
Subacini
--
Emre Sevinc
2, 2015 at 6:34 PM, Emre Sevinc emre.sev...@gmail.com wrote:
Hello,
I'm using Apache Spark Streaming 1.2.0 and trying to define a file filter
for file names when creating an InputDStream
https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/streaming/dstream/InputDStream.html
Hello,
I'm using Apache Spark Streaming 1.2.0 and trying to define a file filter
for file names when creating an InputDStream
https://spark.apache.org/docs/1.2.0/api/java/org/apache/spark/streaming/dstream/InputDStream.html
by invoking the fileStream
...@spark.apache.org
--
Emre Sevinc
/questions/4716310/is-there-a-way-to-exclude-a-maven-dependency-globally
(I don't know if a provided dependency will work without a specific
version number so I'm just making a guess here.)
On Wed Jan 28 2015 at 11:24:02 AM Emre Sevinc emre.sev...@gmail.com
wrote:
When I examine the dependencies
with the Maven shade plugin.
On Wed Jan 28 2015 at 8:00:22 AM Emre Sevinc emre.sev...@gmail.com
wrote:
Hello,
I'm using *Spark 1.1.0* and *Solr 4.10.3*. I'm getting an exception when
using *HttpSolrServer* from within Spark Streaming:
15/01/28 13:42:52 ERROR Executor: Exception in task 0.0 in stage
Hello,
I'm using *Spark 1.1.0* and *Solr 4.10.3*. I'm getting an exception when
using *HttpSolrServer* from within Spark Streaming:
15/01/28 13:42:52 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoSuchMethodError:
in version 4.3.1
of httpcomponents so even if by chance one of them did rely on
commons-httpclient there wouldn't be a class conflict.
On Wed Jan 28 2015 at 9:19:20 AM Emre Sevinc emre.sev...@gmail.com
wrote:
This is what I get:
./bigcontent-1.0-SNAPSHOT.jar:org/apache/http/impl/conn
Hello,
I have a piece of code that runs inside Spark Streaming and tries to get
some data from a RESTful web service (that runs locally on my machine). The
code snippet in question is:
Client client = ClientBuilder.newClient();
WebTarget target =
On Wed, Dec 24, 2014 at 1:46 PM, Sean Owen so...@cloudera.com wrote:
I'd take a look with 'mvn dependency:tree' on your own code first.
Maybe you are including JavaEE 6 for example?
For reference, my complete pom.xml looks like:
project xmlns=http://maven.apache.org/POM/4.0.0; xmlns:xsi=
,
which in turn only appear in examples, so that's unlikely to be it.
I'd take a look with 'mvn dependency:tree' on your own code first.
Maybe you are including JavaEE 6 for example?
On Wed, Dec 24, 2014 at 12:02 PM, Emre Sevinc emre.sev...@gmail.com
wrote:
Hello,
I have a piece of code
, Emre Sevinc emre.sev...@gmail.com
wrote:
It seems like YARN depends an older version of Jersey, that is 1.9:
https://github.com/apache/spark/blob/master/yarn/pom.xml
When I've modified my dependencies to have only:
dependency
groupIdcom.sun.jersey/groupId
On Fri, Dec 12, 2014 at 2:17 PM, Eric Loots eric.lo...@gmail.com wrote:
How can the log level in test mode be reduced (or extended when needed) ?
Hello Eric,
The following might be helpful for reducing the log messages during unit
testing:
http://stackoverflow.com/a/2736/236007
--
Emre
Hello,
I've successfully built a very simple Spark Streaming application in Java
that is based on the HdfsCount example in Scala at
https://github.com/apache/spark/blob/branch-1.1/examples/src/main/scala/org/apache/spark/examples/streaming/HdfsWordCount.scala
.
When I submit this application to
Hello,
I'm currently developing a Spark Streaming application and trying to write
my first unit test. I've used Java for this application, and I also need
use Java (and JUnit) for writing unit tests.
I could not find any documentation that focuses on Spark Streaming unit
testing, all I could
Hello,
Specifying '-DskipTests' on commandline worked, though I can't be sure
whether first running 'sbt assembly' also contributed to the solution.
(I've tried 'sbt assembly' because branch-1.1's README says to use sbt).
Thanks for the answer.
Kind regards,
Emre Sevinç
61 matches
Mail list logo