You need to instantiate the server in the forEachPartition block or Spark will attempt to serialize it to the task. See the design patterns section in the Spark Streaming guide.
Jose Fernandez | Principal Software Developer jfernan...@sdl.com | The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. Jose Fernandez | Principal Software Developer jfernan...@sdl.com | The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. -----Original Message----- From: dgoldenberg [mailto:dgoldenberg...@gmail.com] Sent: Wednesday, February 18, 2015 1:54 PM To: user@spark.apache.org Subject: NotSerializableException: org.apache.http.impl.client.DefaultHttpClient when trying to send documents to Solr I'm using Solrj in a Spark program. When I try to send the docs to Solr, I get the NotSerializableException on the DefaultHttpClient. Is there a possible fix or workaround? I'm using Spark 1.2.1 with Hadoop 2.4, SolrJ is version 4.0.0. final HttpSolrServer solrServer = new HttpSolrServer(SOLR_SERVER_URL); ... JavaRDD<SolrInputDocument> solrDocs = rdd.map(new Function<Row, SolrInputDocument>() { public SolrInputDocument call(Row r) { return r.toSolrDocument(); } }); solrDocs.foreachPartition(new VoidFunction<Iterator<SolrInputDocument>>() { public void call(Iterator<SolrInputDocument> solrDocIterator) throws Exception { List<SolrInputDocument> batch = new ArrayList<SolrInputDocument>(); while (solrDocIterator.hasNext()) { SolrInputDocument inputDoc = solrDocIterator.next(); batch.add(inputDoc); if (batch.size() >= batchSize) { Utils.sendBatchToSolr(solrServer, solrCollection, batch); } } if (!batch.isEmpty()) { Utils.sendBatchToSolr(solrServer, solrCollection, batch); } } }); ---------------- Exception in thread "main" org.apache.spark.SparkException: Task not serializable at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:166) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:158) at org.apache.spark.SparkContext.clean(SparkContext.scala:1478) at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:789) at org.apache.spark.api.java.JavaRDDLike$class.foreachPartition(JavaRDDLike.scala:195) at org.apache.spark.api.java.JavaRDD.foreachPartition(JavaRDD.scala:32) at com.kona.motivis.spark.proto.SparkProto.execute(SparkProto.java:158) at com.kona.motivis.spark.proto.SparkProto.main(SparkProto.java:186) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.NotSerializableException: org.apache.http.impl.client.DefaultHttpClient at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1183) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:73) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:164) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/NotSerializableException-org-apache-http-impl-client-DefaultHttpClient-when-trying-to-send-documentsr-tp21713.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org This message has been scanned for malware by Websense. www.websense.com --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org