date:20150525

Re: Re: how to distributed run a bash shell in spark

2015-05-25 Thread madhu phatak

Hi, You can use pipe operator, if you are running shell script/perl script on some data. More information on my blog http://blog.madhukaraphatak.com/pipe-in-spark/. Regards, Madhukara Phatak http://datamantra.io/ On Mon, May 25, 2015 at 8:02 AM, luohui20...@sina.com wrote: Thanks Akhil,

Intellij IDEA import spark souce code error

2015-05-25 Thread huangzheng

Hi all I want to learn spark source code recently. Git clone spark code from git , and exec sbt gen-idea command . import the project into intellij , have such error below: Anyone could help me? Spark version is 1.4, operation system is windows 7

RE: Using Spark like a search engine

2015-05-25 Thread ankur chauhan

Hi, I am sure you can use spark for this but it seems like a problem that should be delegated to a text based indexing technology like elastic search or something based on lucene to serve the requests. Spark can be used to prepare the data that can be fed to the indexing service. Using spark

The stage slow when I have for loop inside (Java)

2015-05-25 Thread allanjie

Hi all, I only have one stage which is mapToPair and inside the function, I have a for loop which will do about 133433 times. But then it becomes slow, when I replace 133433 with just 133, it works very fast. But I think this is just a simple operation even in normal Java. You can look at the

Re: Intellij IDEA import spark souce code error

2015-05-25 Thread Yi Zhang

I am not sure what happen. According to your snapshot, it just shows warning message instead of error. But I suggest you can try to use maven with: mvn idea:idea. On Monday, May 25, 2015 2:48 PM, huangzheng 1106944...@qq.com wrote: !--#yiv0816328792 _filtered #yiv0816328792

Re: Using Spark like a search engine

2015-05-25 Thread ayan guha

Yes, spark will be useful for following areas of your application: 1. Running same function on every CV in parallel and score 2. Improve scoring function by better access to classification and clustering algorithms, within and beyond mllib. These are first benefits you can start with and then

Re: Using Spark like a search engine

2015-05-25 Thread Сергей Мелехин

Hi, ankur! Thanks for your reply! CVs are a just bunch of IDs, each ID represents some object of some class (eg. class=JOB, object=SW Developer). We have already processed texts and extracted all facts. So we don't need to do any text processing in Spark, just to run scoring function on many many

Websphere MQ as a data source for Apache Spark Streaming

2015-05-25 Thread umesh9794

I was digging into the possibilities for Websphere MQ as a data source for spark-streaming becuase it is needed in one of our use case. I got to know that MQTT http://mqtt.org/ is the protocol that supports the communication from MQ data structures but since I am a newbie to spark streaming I

回复：Re: Re: how to distributed run a bash shell in spark

2015-05-25 Thread luohui20001

thanks, madhu and Akhil I modified my code like below,however I think it is not so distributed. Have you guys better idea to run this app more efficiantly and distributed? So I add some comments with my understanding: import org.apache.spark._ import www.celloud.com.model._ object GeneCompare3 {

spark sql through java code facing issue

2015-05-25 Thread vinayak

Hi All, *I am new to spark and trying to execute spark sql through java code as below* package com.ce.sql; import java.util.List; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import

Re: Websphere MQ as a data source for Apache Spark Streaming

2015-05-25 Thread Arush Kharbanda

Hi Umesh, You can connect to Spark Streaming with MQTT refer to the example. https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/streaming/MQTTWordCount.scala Thanks Arush On Mon, May 25, 2015 at 3:43 PM, umesh9794 umesh.chaudh...@searshc.com

Tasks randomly stall when running on mesos

2015-05-25 Thread Reinis Vicups

Hello, I am using Spark 1.3.1-hadoop2.4 with Mesos 0.22.1 with zookeeper and running on a cluster with 3 nodes on 64bit ubuntu. My application is compiled with spark 1.3.1 (apparently with mesos 0.21.0 dependency), hadoop 2.5.1-mapr-1503 and akka 2.3.10. Only with this combination I have

Re: Tasks randomly stall when running on mesos

2015-05-25 Thread Iulian Dragoș

On Mon, May 25, 2015 at 2:43 PM, Reinis Vicups sp...@orbit-x.de wrote: Hello, I am using Spark 1.3.1-hadoop2.4 with Mesos 0.22.1 with zookeeper and running on a cluster with 3 nodes on 64bit ubuntu. My application is compiled with spark 1.3.1 (apparently with mesos 0.21.0 dependency),

Re: Re: Re: how to distributed run a bash shell in spark

2015-05-25 Thread Akhil Das

Can you can tell us what exactly you are trying to achieve? Thanks Best Regards On Mon, May 25, 2015 at 5:00 PM, luohui20...@sina.com wrote: thanks, madhu and Akhil I modified my code like below,however I think it is not so distributed. Have you guys better idea to run this app more

Re: Using Log4j for logging messages inside lambda functions

2015-05-25 Thread Akhil Das

Try this way: object Holder extends Serializable { @transient lazy val log = Logger.getLogger(getClass.getName)} val someRdd = spark.parallelize(List(1, 2, 3)) someRdd.map { element = Holder.*log.info http://log.info/(s$element will be processed)* element + 1

Re: How to use zookeeper in Spark Streaming

2015-05-25 Thread Akhil Das

If you want to notify after every batch is completed, then you can simply implement the StreamingListener https://spark.apache.org/docs/1.3.1/api/scala/index.html#org.apache.spark.streaming.scheduler.StreamingListener interface, which has methods like onBatchCompleted, onBatchStarted etc in which

DataFrame. Conditional aggregation

2015-05-25 Thread Masf

Hi. In a dataframe, How can I execution a conditional sentence in a aggregation. For example, Can I translate this SQL statement to DataFrame?: SELECT name, SUM(IF table.col2 100 THEN 1 ELSE table.col1) FROM table GROUP BY name Thanks -- Regards. Miguel

Re: Tasks randomly stall when running on mesos

2015-05-25 Thread Reinis Vicups

Hello, I assume I am running spark in a fine-grained mode since I haven't changed the default here. One question regarding 1.4.0-RC1 - is there a mvn snapshot repository I could use for my project config? (I know that I have to download source and make-distribution for executor as well)

Using Log4j for logging messages inside lambda functions

2015-05-25 Thread Spico Florin

Hello! I would like to use the logging mechanism provided by the log4j, but I'm getting the Exception in thread main org.apache.spark.SparkException: Task not serializable - Caused by: java.io.NotSerializableException: org.apache.log4j.Logger The code (and the problem) that I'm using resembles

Re: Spark updateStateByKey fails with class leak when using case classes - resend

2015-05-25 Thread rsearle

Further experimentation indicates these problems only occur when master is local[*]. There are no issues if a standalone cluster is used. -- View this message in context:

Re: IPv6 support

2015-05-25 Thread Akhil Das

Hi Kevin, Did you try adding a host name for the ipv6? I have a few ipv6 boxes, spark failed for me when i use just the ipv6 addresses, but it works fine when i use the host names. Here's an entry in my /etc/hosts: 2607:5300:0100:0200::::0a4d hacked.work My spark-env.sh file:

Re: Tasks randomly stall when running on mesos

2015-05-25 Thread Reinis Vicups

Great hints, you guys! Yes spark-shell worked fine with mesos as master. I haven't tried to execute multiple rdd actions in a row though (I did couple of successful counts on hbase tables i am working with in several experiments but nothing that would compare to the stuff my spark jobs are

Re: Tasks randomly stall when running on mesos

2015-05-25 Thread Dean Wampler

Here is a link for builds of 1.4 RC2: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/ For a mvn repo, I believe the RC2 artifacts are here: https://repository.apache.org/content/repositories/orgapachespark-1104/ A few experiments you might try: 1. Does spark-shell work?

SparkSQL's performance : contacting namenode and datanode to uncessarily check all partitions for a query of specific partitions

2015-05-25 Thread ogoh

Hello, I am using SparkSQL 1.3.0 and Hive 0.13.1 on AWS YARN. My Hive table as an external table is partitioned with date and hour. I expected that a query with certain partitions will read only the data files of the partitions. I turned on TRACE level logging for ThriftServer since the query

Re: DataFrame. Conditional aggregation

2015-05-25 Thread ayan guha

Case when col2100 then 1 else col2 end On 26 May 2015 00:25, Masf masfwo...@gmail.com wrote: Hi. In a dataframe, How can I execution a conditional sentence in a aggregation. For example, Can I translate this SQL statement to DataFrame?: SELECT name, SUM(IF table.col2 100 THEN 1 ELSE

Implementing custom RDD in Java

2015-05-25 Thread Swaranga Sarma

Hello, I have a custom data source and I want to load the data into Spark to perform some computations. For this I see that I might need to implement a new RDD for my data source. I am a complete Scala noob and I am hoping that I can implement the RDD in Java only. I looked around the internet

回复：Re: Re: Re: how to distributed run a bash shell in spark

2015-05-25 Thread luohui20001

I am right trying to run some shell script in my spark app, hoping it runs more concurrently in my spark cluster.However I am not sure whether my codes will run concurrently in my executors.Dive into my code, you can see that I am trying to 1.splite both db and sample into 21 small files. That

Implementing custom RDD in Java

2015-05-25 Thread swaranga

Hello, I have a custom data source and I want to load the data into Spark to perform some computations. For this I see that I might need to implement a new RDD for my data source. I am a complete Scala noob and I am hoping that I can implement the RDD in Java only. I looked around the internet

Re: Using Log4j for logging messages inside lambda functions

2015-05-25 Thread Wesley Miao

The reason it didn't work for you is that the function you registered with someRdd.map will be running on the worker/executor side, not in your driver's program. Then you need to be careful to not accidentally close over some objects instantiated from your driver's program, like the log object in

Re: Using Spark like a search engine

2015-05-25 Thread Alex Chavez

Сергей, A simple implementation would be to create a DataFrame of CVs by issuing a Spark SQL query against your Postgres database, persist it in memory, and then to map F over it at query time and return the top

Re: Spark SQL High GC time

2015-05-25 Thread Nick Travers

Hi Yuming - I was running into the same issue with larger worker nodes a few weeks ago. The way I managed to get around the high GC time, as per the suggestion of some others, was to break each worker node up into individual workers of around 10G in size. Divide your cores accordingly. The other

Re: Implementing custom RDD in Java

2015-05-25 Thread Swaranga Sarma

My data is in S3 and is indexed in Dynamo. For example, If I want to load data given a time range, I will first need to query Dynamo for the S3 file keys for the corresponding time range and then load them in Spark. The files may not always be in the same S3 path prefix, hence

Re: Re: is there any easier way to define a custom RDD in Java

2015-05-25 Thread Ted Yu

Please take a look at: core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala core/src/test/java/org/apache/spark/JavaJdbcRDDSuite.java Cheers On Mon, May 25, 2015 at 8:39 PM, swaranga sarma.swara...@gmail.com wrote: Has this changed now? Can a new RDD be implemented in Java? --

Re: Using Spark like a search engine

2015-05-25 Thread Сергей Мелехин

Thanks I'll give it a try! С Уважением, Сергей Мелехин. 2015-05-26 12:56 GMT+10:00 Alex Chavez alexkcha...@gmail.com: Сергей, A simple implementation would be to create a DataFrame of CVs by issuing a Spark SQL query against your Postgres database, persist it in memory, and then to map F

Re: Re: how to distributed run a bash shell in spark

Intellij IDEA import spark souce code error

RE: Using Spark like a search engine

The stage slow when I have for loop inside (Java)

Re: Intellij IDEA import spark souce code error

Re: Using Spark like a search engine

Re: Using Spark like a search engine

Websphere MQ as a data source for Apache Spark Streaming

回复：Re: Re: how to distributed run a bash shell in spark

spark sql through java code facing issue

Re: Websphere MQ as a data source for Apache Spark Streaming

Tasks randomly stall when running on mesos

Re: Tasks randomly stall when running on mesos

Re: Re: Re: how to distributed run a bash shell in spark

Re: Using Log4j for logging messages inside lambda functions

Re: How to use zookeeper in Spark Streaming

DataFrame. Conditional aggregation

Re: Tasks randomly stall when running on mesos

Using Log4j for logging messages inside lambda functions

Re: Spark updateStateByKey fails with class leak when using case classes - resend

Re: IPv6 support

Re: Tasks randomly stall when running on mesos

Re: Tasks randomly stall when running on mesos

SparkSQL's performance : contacting namenode and datanode to uncessarily check all partitions for a query of specific partitions

Re: DataFrame. Conditional aggregation

Implementing custom RDD in Java

回复：Re: Re: Re: how to distributed run a bash shell in spark

Implementing custom RDD in Java

Re: Using Log4j for logging messages inside lambda functions

Re: Using Spark like a search engine

Re: Spark SQL High GC time

Re: Implementing custom RDD in Java

Re: Re: is there any easier way to define a custom RDD in Java

Re: Using Spark like a search engine

34 matches

Site Navigation

Mail list logo

Footer information