(nullable = true)
The function for format addresses:
def formatAddress(address: String): String
Best regards,
Hao Wang
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h
Schema)
>
> Thanks!
>
> - Terry
>
> On Wed, Sep 16, 2015 at 2:10 AM, Hao Wang wrote:
>
>> Hi,
>>
>> I created a dataframe with 4 string columns (city, state, country,
>> zipcode).
>> I then applied the following nested schema to it by creating a cust
Hi,
I created a dataframe with 4 string columns (city, state, country, zipcode).
I then applied the following nested schema to it by creating a custom
StructType. When I run df.take(5), it gives the exception below as expected.
The question is how I can convert the Rows in the dataframe to confor
Hi,
I created a dataframe with 4 string columns (city, state, country, zipcode).
I then applied the following nested schema to it by creating a custom
StructType. When I run df.take(5), it gives the exception below as expected.
The question is how I can convert the Rows in the dataframe to confor
ml
> <https://www.mail-archive.com/user@spark.apache.org/msg30204.html>
>
> On June 13, 2015, at 5:41 AM, Hao Wang wrote:
>
>
> Hi,
>
> I have a bunch of large log files on Hadoop. Each line contains a log and its
> severity. Is there a way that I can u
I am currently using filter inside a loop of all severity levels to do this,
which I think is pretty inefficient. It has to read the entire data set once
for each severity. I wonder if there is a more efficient way that takes just
one pass of the data? Thanks.
Best,
Hao Wang
> On Jun 13, 2
] log2
[ERROR] log3
[INFO] log4
Output:
error_file
[ERROR] log1
[ERROR] log3
info_file
[INFO] log2
[INFO] log4
Best,
Hao Wang
Hi, all
I am wondering if I use Spark-shell to scan a large file to obtain lines
containing "error", whether the shell returns results while the job is
executing, or the job has been totally finished.
Regards,
Wang Hao(王灏)
CloudTeam | School of Software Engineering
Shanghai Jiao Tong University
Hi, all
Yes, it's a name of Wikipedia article. I am running WikipediaPageRank
example of Spark Bagels.
I am wondering whether there is any relation to buffer size of Kyro.
The page rank can be successfully finished, sometimes not because this kind
of Kyro exception happens too many times, which b
Dongchuan Road, Minhang District, Shanghai, 200240
Email:wh.s...@gmail.com
On Thu, Jul 17, 2014 at 2:58 AM, Tathagata Das
wrote:
> Is the class that is not found in the wikipediapagerank jar?
>
> TD
>
>
> On Wed, Jul 16, 2014 at 12:32 AM, Hao Wang wrote:
>
>> Thanks for your
Regards,
Wang Hao(王灏)
CloudTeam | School of Software Engineering
Shanghai Jiao Tong University
Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
Email:wh.s...@gmail.com
On Wed, Jul 16, 2014 at 1:41 PM, Tathagata Das
wrote:
> Are you using classes from external libraries tha
I am running the WikipediaPageRank in Spark example and share the same
problem with you:
4/07/16 11:31:06 DEBUG DAGScheduler: submitStage(Stage 6)
14/07/16 11:31:06 ERROR TaskSetManager: Task 6.0:450 failed 4 times;
aborting job
14/07/16 11:31:06 INFO DAGScheduler: Failed to run foreach at
Bagel.s
Hi, All
In Spark the "spark.driver.host" is driver hostname in default, thus, akka
actor system will listen to a URL like akka.tcp://hostname:port. However,
when a user tries to use spark-submit to run application, the user may set
"--master spark://192.168.1.12:7077".
Then, the *AppClient* in *S
Hi, Wei
You may try to set JVM opts in *spark-env.sh* as follow to prevent or
mitigate GC pause:
export SPARK_JAVA_OPTS="-XX:-UseGCOverheadLimit -XX:+UseConcMarkSweepGC
-Xmx2g -XX:MaxPermSize=256m"
There are more options you could add, please just Google :)
Regards,
Wang Hao(王灏)
CloudTeam | S
Hi, Laurent
You could set Spark.executor.memory and heap size by following methods:
1. in you conf/spark-env.sh:
*export SPARK_WORKER_MEMORY=38g*
*export SPARK_JAVA_OPTS="-XX:-UseGCOverheadLimit
-XX:+UseConcMarkSweepGC -Xmx2g -XX:MaxPermSize=256m"*
2. you could also add modification for
Hi, all
When I try to run Spark PageRank using:
./bin/spark-submit \
--master spark://192.168.1.12:7077 \
--class org.apache.spark.examples.bagel.WikipediaPageRank \
~/Documents/Scala/WikiPageRank/target/scala-2.10/wikipagerank_2.10-1.0.jar \
hdfs://192.168.1.12:9000/freebase-13G 0.05 100 Tru
; This is not removed. We moved it here:
> http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html
> If you're building with SBT, and you don't specify the
> SPARK_HADOOP_VERSION, then it defaults to 1.0.4.
>
> Andrew
>
>
> 2014-06-12 6:24 GMT-07:0
Hi, all
Why does the Spark 1.0.0 official doc remove how to build Spark with
corresponding Hadoop version?
It means that if I don't need to specify the Hadoop version with I build my
Spark 1.0.0 with `sbt/sbt assembly`?
Regards,
Wang Hao(王灏)
CloudTeam | School of Software Engineering
Shanghai
Hi, Zuhair
According to my experience, you could try following steps to avoid Spark
OOM:
1. Increase JVM memory by adding export SPARK_JAVA_OPTS="-Xmx2g"
2. Use .persist(storage.StorageLevel.MEMORY_AND_DISK) instead of .cache()
3. Have you set spark.executor.memory value? It's 512m by de
Hi, all
I am running a 30GB Wikipedia dataset on a 7-server cluster. Using
WikipediaPageRank underexample/Bagel.
My Spark version is bae07e3 [behind 1] fix different versions of
commons-lang dependency and apache/spark#746 addendum
The problem is that the job will fail after several stages becau
...@gmail.com
On Sun, May 18, 2014 at 1:52 PM, Hao Wang wrote:
> Hi, all
>
> *Spark version: bae07e3 [behind 1] fix different versions of commons-lang
> dependency and apache/spark#746 addendum*
>
> I have six worker nodes and four of them have this NoClassDefFoundError when
> I use
Hi,
I am running some examples of Spark on a cluster. Because I need to modify
some source code, I have to re-compile the whole Spark using `sbt/sbt
assembly`, which takes a long time.
I have tried `mvn package` under the example directory, it failed because
of some dependencies problem. Any way
Hi,
Oracle JDK and OpenJDK, which one is better or preferred for Spark?
Regards,
Wang Hao(王灏)
CloudTeam | School of Software Engineering
Shanghai Jiao Tong University
Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
Email:wh.s...@gmail.com
Hi, all
*Spark version: bae07e3 [behind 1] fix different versions of commons-lang
dependency and apache/spark#746 addendum*
I have six worker nodes and four of them have this NoClassDefFoundError when
I use thestart-slaves.sh on my driver node. However, running ./bin/spark-class
org.apache.spark.
24 matches
Mail list logo