I have 3 Spark Masters colocated with ZK's nodes and 2 Workers nodes. so my
NameNodes are the same nodes as my spark master and DataNodes are the same
Nodes as my Spark Workers. is that correct? How do I setup HDFS with
zookeeper?
On Fri, Feb 3, 2017 at 10:27 PM, Mark Hamstra
wrote:
> yes
>
> On
On Fri, Feb 3, 2017 at 10:27 PM, Mark Hamstra
wrote:
> yes
>
> On Fri, Feb 3, 2017 at 10:08 PM, kant kodali wrote:
>
>> can I use Spark Standalone with HDFS but no YARN?
>>
>> Thanks!
>>
>
>
yes
On Fri, Feb 3, 2017 at 10:08 PM, kant kodali wrote:
> can I use Spark Standalone with HDFS but no YARN?
>
> Thanks!
>
can I use Spark Standalone with HDFS but no YARN?
Thanks!
sorry I should just do this
./start-slave.sh spark://x.x.x.x:7077,y.y.y.y:7077,z.z.z.z:7077
but what about export SPARK_MASTER_HOST="x.x.x.x y.y.y.y z.z.z.z" ? Dont
I need to have that on my worker node?
Thanks!
On Fri, Feb 3, 2017 at 4:57 PM, kant kodali wrote:
> Hi,
>
> How do I start a
Hi,
How do I start a slave? just run start-slave.sh script? but then I don't
understand the following.
I put the following in spark-env.sh in the worker machine
export SPARK_MASTER_HOST="x.x.x.x y.y.y.y z.z.z.z"
but start-slave.sh doesn't seem to take SPARK_MASTER_HOST env variable. so
I did th
I may have found my problem. We have a scala wrapper on top of spark-submit
to run the shell command through scala.
We were kind of eating the exit code from spark-submit in that wrapper.
When I looked at what the actual exit code was stripping away the wrapper I
got 1.
So I think spark-submit is
Hey Asher,
A phone call may be the best to discuss all of this. But in short:
1. It is quite easy to add custom pipelines/models to MLeap. All of our
out-of-the-box transformers can serve as a good example of how to do this.
We are also putting together documentation on how to do this in our docs
Hi,
➜ spark git:(master) ✗ ./bin/spark-submit whatever || echo $?
Error: Cannot load main class from JAR file:/Users/jacek/dev/oss/spark/whatever
Run with --help for usage help or --verbose for debug output
1
I see 1 and there are other cases for 1 too.
Pozdrawiam,
Jacek Laskowski
https://
Hello,
+1, i have exactly the same issue. I need the exit code to make a decision
on oozie executing actions. Spark-submit always returns 0 when catching the
exception. From spark 1.5 to 1.6.x, i still have the same issue... It would
be great to fix it or to know if there is some work around about
Hi,
An interesting case. You don't use Spark resources whatsoever.
Creating a SparkConf does not use YARN...yet. I think any run mode
would have the same effect. So, although spark-submit could have
returned exit code 1, the use case touches Spark very little.
What version is that? Do you see "Th
Hi,
Yes. Forget about SQLContext. It's been merged into SparkSession as of
Spark 2.0 (same about HiveContext).
Long live SparkSession! :-)
Jacek
On 3 Feb 2017 7:48 p.m., "☼ R Nair (रविशंकर नायर)" <
ravishankar.n...@gmail.com> wrote:
All,
In Spark 1.6.0, we used
val jdbcDF = sqlContext.read.
Asher,
I found a profile for Spark 2.11 and removed it. Now, it brings in 2.10. I ran
some code and got further. Now, I get this error below when I do a “df.show”.
java.lang.AbstractMethodError
at org.apache.spark.Logging$class.log(Logging.scala:50)
at
org.apache.spark.sql.execu
You can see in the tree what's pulling in 2.11. Your option then will be to
either shade them and add an explicit dependency on 2.10.5 in your pom.
Alternatively, you can explore upgrading your project to 2.11 (which will
require using a 2_11 build of spark)
On Fri, Feb 3, 2017 at 2:03 PM, Benjam
Hi All,
I wrote a test script which always throws an exception as below :
object Test {
def main(args: Array[String]) {
try {
val conf =
new SparkConf()
.setAppName("Test")
throw new RuntimeException("Some Exception")
println("all done!")
} catch
Asher,
You’re right. I don’t see anything but 2.11 being pulled in. Do you know where
I can change this?
Cheers,
Ben
> On Feb 3, 2017, at 10:50 AM, Asher Krim wrote:
>
> Sorry for my persistence, but did you actually run "mvn dependency:tree
> -Dverbose=true"? And did you see only scala 2.1
Hi there,
Are you sure that the cluster nodes where the executors run have network
connectivity to the elastic cluster?
Speaking of which, why don't you use:
https://github.com/elastic/elasticsearch-hadoop#apache-spark ?
Cheers,
Anastasios
On Fri, Feb 3, 2017 at 7:10 PM, Dmitry Goldenberg
wrot
Sorry for my persistence, but did you actually run "mvn dependency:tree
-Dverbose=true"? And did you see only scala 2.10.5 being pulled in?
On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim wrote:
> Asher,
>
> It’s still the same. Do you have any other ideas?
>
> Cheers,
> Ben
>
>
> On Feb 3, 2017,
All,
In Spark 1.6.0, we used
val jdbcDF = sqlContext.read.format(-)
for creating a data frame through hsbc.
In Spark 2.1.x, we have seen this is
val jdbcDF = *spark*.read.format(-)
Does that mean we should not be using sqlContext going forward? Also, we
see that sqlContext is not auto
I have a bunch of questions for you Hollin:
How easy is it to add support for custom pipelines/models?
Are Spark mllib models supported?
We currently run spark in local mode in an api service. It's not super
terrible, but performance is a constant struggle. Have you benchmarked any
performance dif
Thanks Fernando. But I need to have only 1 row for a given user, date with
very low latency. So none of your options work for me.
On Fri, Feb 3, 2017 at 10:34 AM, Fernando Avalos wrote:
> Hi Shyla,
>
> Maybe I am wrong, but I can see two options here.
>
> 1.- Do some grouping before insert to
Hi All,
I wanted to add more info ..
The first column is the user and the third is the period. and my key is
(userid, date) For a given user and date combination I want to see only 1
row. My problem is that PT0H10M0S is overwritten by PT0H9M30S, even though
the order of the rows in the RDD is PT0H
Hi,
Any reason why we might be getting this error? The code seems to work fine
in the non-distributed mode but the same code when run from a Spark job is
not able to get to Elastic.
Spark version: 2.0.1 built for Hadoop 2.4, Scala 2.11
Elastic version: 2.3.1
I've verified the Elastic hosts and
Asher,
It’s still the same. Do you have any other ideas?
Cheers,
Ben
> On Feb 3, 2017, at 8:16 AM, Asher Krim wrote:
>
> Did you check the actual maven dep tree? Something might be pulling in a
> different version. Also, if you're seeing this locally, you might want to
> check which version
I'll clean up any .m2 or .ivy directories. And try again.
I ran this on our lab cluster for testing.
Cheers,
Ben
On Fri, Feb 3, 2017 at 8:16 AM Asher Krim wrote:
> Did you check the actual maven dep tree? Something might be pulling in a
> different version. Also, if you're seeing this locally
Hey Aseem,
We have built pipelines that execute several string indexers, one hot
encoders, scaling, and a random forest or linear regression at the end.
Execution time for the linear regression was on the order of 11
microseconds, a bit longer for random forest. This can be further optimized
by us
Did you check the actual maven dep tree? Something might be pulling in a
different version. Also, if you're seeing this locally, you might want to
check which version of the scala sdk your IDE is using
Asher Krim
Senior Software Engineer
On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim wrote:
> Hi
Hi,
can You guys tell me if below peice of two codes are returning the same
thing?
(((DoubleObjectInspector) ins2).get(obj)); and (DoubleWritable)obj).get(); from
below two codes
code 1)
public Object get(Object name) {
int pos = getPos((String)name);
if(pos<0) return null;
Stri
Hi,
I'm new to SparkStreaming.
I'm using the versions 2.10 for spark core and spark streaming
My issue is that when i try to use JavaPairDStream.foreachRDD :
test.foreachRDD(new Function,Void>() {
public Void call(JavaPairRDD rdd) {
currentResponseCodeCounts =
Does this support Java 7?
On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal wrote:
> Is computational time for predictions on the order of few milliseconds (<
> 10 ms) like the old mllib library?
>
> On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins wrote:
>
>> Hey everyone,
>>
>>
>> Some of you may h
Is computational time for predictions on the order of few milliseconds (<
10 ms) like the old mllib library?
On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins wrote:
> Hey everyone,
>
>
> Some of you may have seen Mikhail and I talk at Spark/Hadoop Summits about
> MLeap and how you can use it to b
Hi,
Is possible Bipartite projection with Graphx
Rdd1
#id name
1 x1
2 x2
3 x3
4 x4
5 x5
6 x6
7 x7
8 x8
Rdd2
#id name
10001 y1
10002 y2
10003 y3
10004 y4
10005 y5
10006 y6
EdgeList
#src id Dest id
1 10001
1 10002
2
Hi Team,
Actually I figured out something ..
While Hive Java UDF executed on hive it is giving output with 10 decimal
precision but in spark same udf is giving results rounded off to 6 decimal
precision... How do I stop that? Its the same java udf jar files used in
both hive and spark..
[image:
Hello All,
This is the content of my RDD which I am saving to Cassandra table.
But looks like the 2nd row is written first and then the first row
overwrites it. So I end up with bad output.
(494bce4f393b474980290b8d1b6ebef9, 2017-02-01, PT0H9M30S, WEDNESDAY)
(494bce4f393b474980290b8d1b6ebef9, 20
34 matches
Mail list logo