= 1224.6 MB. Storage
limit = 1397.3 MB.
Therefore, I repartitioned the RDDs for better memory utilisation, wich
resolved the issue.
Kind regards,
Guru
On 11 October 2016 at 11:23, diplomatic Guru <diplomaticg...@gmail.com>
wrote:
> @Song, I have called an action but it did not
t;
> Regards,
> Chin Wei
>
> On Tue, Oct 11, 2016 at 6:14 AM, diplomatic Guru <diplomaticg...@gmail.com
> > wrote:
>
>> Hello team,
>>
>> Spark version: 1.6.0
>>
>> I'm trying to persist done data into memory for reusing them. However,
>
Hello team,
Spark version: 1.6.0
I'm trying to persist done data into memory for reusing them. However, when
I call rdd.cache() OR rdd.persist(StorageLevel.MEMORY_ONLY()) it does not
store the data as I can not see any rdd information under WebUI (Storage
Tab).
Therefore I tried
Hello all,
I have built a spark batch model using MLlib and a Streaming online model.
Now I would like to load the offline model in streaming job and apply and
update the model. Could to please advise me how to do it. is there an
example to look at. The streaming model does not allow saving or
but wanted to find
out.
Thanks.
On 21 June 2016 at 13:55, Sean Owen <so...@cloudera.com> wrote:
> There's nothing inherently wrong with a regression predicting a
> negative value. What is the issue, more specifically?
>
> On Tue, Jun 21, 2016 at 1:38 PM, diplomatic Guru
> <
Hello all,
I have a job for forecasting using linear regression, but sometimes I'm
getting a negative prediction. How do I prevent this?
Thanks.
Hello,
I'm trying to find an example of using StreamingLinearRegression in Java,
but couldn't find any. There are examples for Scala but not for Java, Has
anyone got any example that I can take a look.
Thanks.
Hello all, I was wondering if it is possible to use H2O with Spark
Streaming for online prediction?
Hello all, I was wondering if it is possible to use H2O with Spark
Streaming for online prediction?
Could someone verify this for me?
On 8 March 2016 at 14:06, diplomatic Guru <diplomaticg...@gmail.com> wrote:
> Hello all,
>
> I'm using Random Forest for my machine learning (batch), I would like to
> use online prediction using Streaming job. However, the document on
Hello all,
I'm using Random Forest for my machine learning (batch), I would like to
use online prediction using Streaming job. However, the document only
states linear algorithm for regression job. Could we not use other
algorithms?
Losses.scala
>
> When passing the Loss, you should be able to do something like:
>
> Losses.fromString("leastSquaresError")
>
> On Mon, Feb 29, 2016 at 10:03 AM, diplomatic Guru <
> diplomaticg...@gmail.com> wrote:
>
>> It's strange as you are co
n Mellott <kevin.r.mell...@gmail.com>
wrote:
> Looks like it should be present in 1.3 at
> org.apache.spark.mllib.tree.loss.AbsoluteError
>
>
> spark.apache.org/docs/1.3.0/api/java/org/apache/spark/mllib/tree/loss/AbsoluteError.html
>
> On Mon, Feb 29, 2016 at 9:46 AM, d
bject, since that object implements the Loss interface.
> For example.
>
> val loss = new AbsoluteError()
> boostingStrategy.setLoss(loss)
>
> On Mon, Feb 29, 2016 at 9:33 AM, diplomatic Guru <diplomaticg...@gmail.com
> > wrote:
>
>> Hi Kevin,
>>
>> Y
Hello guys,
I think the default Loss algorithm is Squared Error for regression, but how
do I change that to Absolute Error in Java.
Could you please show me an example?
;
> You should have at the end for Januar and PageA something like :
>
> LabeledPoint (label , (0,0,1,0,0,01,1.0,2.0,3.0))
>
> Pass the LabeledPoint to the ML model.
>
> test it.
>
> PS: label is what you want to predict.
>
> On 02/02/2016, at 20:44, diplomatic Guru <diplomatic
t me know what I'm doing wrong?
PS: My cluster is running Spark 1.3.0, which doesn't support StringIndexer,
OneHotEncoder but for testing this I've installed the 1.6.0 on my local
machine.
Cheer.
On 2 February 2016 at 10:25, Jorge Machado <jom...@me.com> wrote:
> Hi Guru,
>
>
Any suggestions please?
On 29 January 2016 at 22:31, diplomatic Guru <diplomaticg...@gmail.com>
wrote:
> Hello guys,
>
> I'm trying understand how I could predict the next month page views based
> on the previous access pattern.
>
> For example, I've collected statistic
Hello guys,
I'm trying understand how I could predict the next month page views based
on the previous access pattern.
For example, I've collected statistics on page views:
e.g.
Page,UniqueView
-
pageA, 1
pageB, 999
...
pageZ,200
I aggregate the statistics monthly.
Hello guys,
I've been trying to read avro file using Spark's DataFrame but it's
throwing this error:
java.lang.NoSuchMethodError:
org.apache.spark.sql.SQLContext.read()Lorg/apache/spark/sql/DataFrameReader;
This is what I've done so far:
I've added the dependency to pom.xml:
Hello team,
I need to present the Spark job performance to my management. I could get
the execution time by measuring the starting and finishing time of the job
(includes overhead). However, not sure how to get the other matrices e.g
cpu, i/o, memory etc..
I want to measure the individual job,
Hello team,
I need to present the Spark job performance to my management. I could get
the execution time by measuring the starting and finishing time of the job
(includes overhead). However, not sure how to get the other matrices e.g
cpu, i/o, memory etc..
I want to measure the individual job,
Hello,
I know how I could clear the old state depending on the input value. If
some condition matches to determine that the state is old then set the
return null, will invalidate the record. But this is only feasible if a new
record arrives that matches the old key. What if no new data arrives
Hello team,
I was wondering whether it is a good idea to have multiple hosts and
multiple ports for a spark job. Let's say that there are two hosts, and
each has 2 ports, is this a good idea? If this is not an issue then what is
the best way to do it. Currently, we pass it as an argument comma
I have an issue with a Spark Streaming job that appears to be running but
not producing any results. Therefore, I would like to enable the debugging
mode to get much logging as possible.
Hello All,
When I checked my running Stream job on WebUI, I can see that some RDDs are
being listed that were not requested to be cached. What more is that they
are growing! I've not asked them to be cached. What are they? Are they the
state (UpdateStateByKey)?
Only the rows in white are being
I know it uses lazy model, which is why I was wondering.
On 27 October 2015 at 19:02, Uthayan Suthakar
wrote:
> Hello all,
>
> What I wanted to do is configure the spark streaming job to read the
> database using JdbcRDD and cache the results. This should occur only
Hello All,
I have a Spark Streaming job that should do some action only if the RDD is
not empty. This can be done easily with the spark batch RDD as I could
.take(1) and check whether it is empty or not. But this cannot been done
in Spark Streaming DStrem
JavaPairInputDStream
r 2015 at 18:00, diplomatic Guru <diplomaticg...@gmail.com>
wrote:
>
> Hello All,
>
> I have a Spark Streaming job that should do some action only if the RDD
> is not empty. This can be done easily with the spark batch RDD as I could
> .take(1) and check whether it is empty
t;t...@databricks.com> wrote:
> What do you mean by checking when a "DStream is empty"? DStream represents
> an endless stream of data, and at point of time checking whether it is
> empty or not does not make sense.
>
> FYI, there is RDD.isEmpty()
>
>
>
> On Wed
>
> ---
> Robin East
> *Spark GraphX in Action* Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
> On 16 Sep 2015, at 15:46
have a mapper that emit key/value pairs(composite keys and composite
values separated by comma).
e.g
*key:* a,b,c,d *Value:* 1,2,3,4,5
*key:* a1,b1,c1,d1 *Value:* 5,4,3,2,1
...
...
*key:* a,b,c,d *Value:* 5,4,3,2,1
I could easily SUM these values using reduceByKey.
e.g.
reduceByKey(new
stream or the older stream?
Such problems of binding used to occur in the older push-based approach,
hence we built the polling stream (pull-based).
On Tue, Aug 18, 2015 at 4:45 AM, diplomatic Guru diplomaticg...@gmail.com
wrote:
I'm testing the Flume + Spark integration example (flume count
append to do
bulk inserts to oracle.
On Thu, Jul 23, 2015 at 1:12 AM, diplomatic Guru diplomaticg...@gmail.com
wrote:
Thanks Robin for your reply.
I'm pretty sure that writing to Oracle is taking longer as when writing
to HDFS is only taking ~5minutes.
The job is writing about ~5 Million
Hello all,
We are having a major performance issue with the Spark, which is holding us
from going live.
We have a job that carries out computation on log files and write the
results into Oracle DB.
The reducer 'reduceByKey' have been set to parallelize by 4 as we don't
want to establish too
be a performance
problem.
Robin
On 22 Jul 2015, at 19:11, diplomatic Guru diplomaticg...@gmail.com
wrote:
Hello all,
We are having a major performance issue with the Spark, which is holding
us from going live.
We have a job that carries out computation on log files and write the
results into Oracle
...@sigmoidanalytics.com wrote:
Here's an example https://github.com/przemek1990/spark-streaming
Thanks
Best Regards
On Thu, Jul 9, 2015 at 4:35 PM, diplomatic Guru diplomaticg...@gmail.com
wrote:
Hello all,
I'm trying to configure the flume to push data into a sink so that my
stream job could pick
Hello all,
I'm trying to configure the flume to push data into a sink so that my
stream job could pick up the data. My events are in JSON format, but the
Spark + Flume integration [1] document only refer to Avro sink.
[1] https://spark.apache.org/docs/latest/streaming-flume-integration.html
I
Hello guys,
I'm after some advice on Spark performance.
I've a MapReduce job that read inputs carry out a simple calculation and
write the results into HDFS. I've implemented the same logic in Spark job.
When I tried both jobs on same datasets, I'm getting different execution
time, which is
I want to store the Spark application arguments such as input file, output
file into a Java property files and pass that file into Spark Driver. I'm
using spark-submit for submitting the job but couldn't find a parameter to
pass the properties file. Have you got any suggestions?
Hello All,
I have a Spark job that throws java.lang.OutOfMemoryError: GC overhead
limit exceeded.
The job is trying to process a filesize 4.5G.
I've tried following spark configuration:
--num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G
I tried increasing more
Hello all,
I was wondering if it is possible to have a high latency batch processing
job coexists within Spark Streaming job? If it's possible then could we
share the state of the batch job with the Spark Streaming job?
For example when the long-running batch computation is complete, could we
...@databricks.com
wrote:
Yeah, you'll need to run `sbt publish-local` to push the jars to your
local maven repository (~/.m2) and then depend on version 1.0.0-SNAPSHOT.
On Thu, Apr 24, 2014 at 9:58 AM, diplomatic Guru
diplomaticg...@gmail.com wrote:
It's a simple application based on the People
It worked!! Many thanks for your brilliant support.
On 24 April 2014 18:20, diplomatic Guru diplomaticg...@gmail.com wrote:
Many thanks for your prompt reply. I'll try your suggestions and will get
back to you.
On 24 April 2014 18:17, Michael Armbrust mich...@databricks.com wrote:
Oh
Hello Team,
I'm new to SPARK and just came across SPARK SQL, which appears to be
interesting but not sure how I could get it.
I know it's an Alpha version but not sure if its available for community
yet.
Many thanks.
Raj.
45 matches
Mail list logo