You need to instantiate the server in the forEachPartition block or Spark will 
attempt to serialize it to the task. See the design patterns section in the 
Spark Streaming guide.


Jose Fernandez | Principal Software Developer
jfernan...@sdl.com |

The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.


Jose Fernandez | Principal Software Developer
jfernan...@sdl.com |

The information transmitted, including attachments, is intended only for the 
person(s) or entity to which it is addressed and may contain confidential 
and/or privileged material. Any review, retransmission, dissemination or other 
use of, or taking of any action in reliance upon this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and destroy any copies of this information.

-----Original Message-----
From: dgoldenberg [mailto:dgoldenberg...@gmail.com]
Sent: Wednesday, February 18, 2015 1:54 PM
To: user@spark.apache.org
Subject: NotSerializableException: 
org.apache.http.impl.client.DefaultHttpClient when trying to send documents to 
Solr

I'm using Solrj in a Spark program. When I try to send the docs to Solr, I get 
the NotSerializableException on the DefaultHttpClient.  Is there a possible fix 
or workaround?

I'm using Spark 1.2.1 with Hadoop 2.4, SolrJ is version 4.0.0.

final HttpSolrServer solrServer = new HttpSolrServer(SOLR_SERVER_URL); ...
JavaRDD<SolrInputDocument> solrDocs = rdd.map(new Function<Row,
SolrInputDocument>() {
        public SolrInputDocument call(Row r) {
                return r.toSolrDocument();
        }
});

solrDocs.foreachPartition(new VoidFunction<Iterator&lt;SolrInputDocument>>()
{
        public void call(Iterator<SolrInputDocument> solrDocIterator) throws 
Exception {
                List<SolrInputDocument> batch = new 
ArrayList<SolrInputDocument>();

                while (solrDocIterator.hasNext()) {
                        SolrInputDocument inputDoc = solrDocIterator.next();
                        batch.add(inputDoc);
                        if (batch.size() >= batchSize) {
                                Utils.sendBatchToSolr(solrServer, 
solrCollection, batch);
                        }
                }
                if (!batch.isEmpty()) {
                        Utils.sendBatchToSolr(solrServer, solrCollection, 
batch);
                }
        }
});

----------------

Exception in thread "main" org.apache.spark.SparkException: Task not 
serializable
        at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166)
        at
org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:1478)
        at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:789)
        at
org.apache.spark.api.java.JavaRDDLike$class.foreachPartition(JavaRDDLike.scala:195)
        at
org.apache.spark.api.java.JavaRDD.foreachPartition(JavaRDD.scala:32)
        at
com.kona.motivis.spark.proto.SparkProto.execute(SparkProto.java:158)
        at com.kona.motivis.spark.proto.SparkProto.main(SparkProto.java:186)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at
org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException:
org.apache.http.impl.client.DefaultHttpClient
        at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183)
        at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
        at
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
        at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
        at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
        at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
        at
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
        at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
        at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
        at
java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
        at
java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
        at
java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
        at
java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
        at
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
        at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42)
        at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73)
        at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164)






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/NotSerializableException-org-apache-http-impl-client-DefaultHttpClient-when-trying-to-send-documentsr-tp21713.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org













This message has been scanned for malware by Websense. www.websense.com

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to