How big is your dataset, and what is the vocabulary size? -Xiangrui

On Sun, Jan 4, 2015 at 11:18 PM, Eric Zhen <zhpeng...@gmail.com> wrote:
> Hi,
>
> When we run mllib word2vec(spark-1.1.0), driver get stuck with 100% cup
> usage. Here is the jstack output:
>
> "main" prio=10 tid=0x0000000040112800 nid=0x46f2 runnable
> [0x000000004162e000]
>    java.lang.Thread.State: RUNNABLE
>         at
> java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1847)
>         at
> java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1778)
>         at java.io.DataOutputStream.writeInt(DataOutputStream.java:182)
>         at java.io.DataOutputStream.writeFloat(DataOutputStream.java:225)
>         at
> java.io.ObjectOutputStream$BlockDataOutputStream.writeFloats(ObjectOutputStream.java:2064)
>         at
> java.io.ObjectOutputStream.writeArray(ObjectOutputStream.java:1310)
>         at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1154)
>         at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1518)
>         at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1483)
>         at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1400)
>         at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
>         at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1518)
>         at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1483)
>         at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1400)
>         at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
>         at
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1518)
>         at
> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1483)
>         at
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1400)
>         at
> java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1158)
>         at
> java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:330)
>         at
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
>         at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
>         at
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)
>         at
> org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
>         at org.apache.spark.SparkContext.clean(SparkContext.scala:1242)
>         at org.apache.spark.rdd.RDD.mapPartitionsWithIndex(RDD.scala:610)
>         at
> org.apache.spark.mllib.feature.Word2Vec$$anonfun$fit$1.apply$mcVI$sp(Word2Vec.scala:291)
>         at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
>         at org.apache.spark.mllib.feature.Word2Vec.fit(Word2Vec.scala:290)
>         at com.baidu.inf.WordCount$.main(WordCount.scala:31)
>         at com.baidu.inf.WordCount.main(WordCount.scala)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at
> org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
>         at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
>         at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> --
> Best Regards

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to