Very interesting... thanks Thom. I haven't given HttpClient a shot yet, but will be soon.
-S On 7/31/06, Thom Nelson <[EMAIL PROTECTED]> wrote:
I had a similar problem and was able to fix it in Solr by manually buffering the responses to a StringWriter before sending it to Tomcat. Essentially, Tomcat's buffer will only hold so much and at that point it blocks (thus it always hangs at a constant number of documents). However, a better solution (to be implemented) is to use more intelligent code on the client to read the response at the same time that it is sending input -- not too difficult to do, though best to do with two threads (i.e. fire off a thread to read the response before you send any data). Seeing as the HttpClient code probably does this already, I'll most likely end up using that. On 7/31/06, sangraal aiken <[EMAIL PROTECTED]> wrote: > Those are some great ideas Chris... I'm going to try some of them out. I'll > post the results when I get a chance to do more testing. Thanks. > > At this point I can work around the problem by ignoring Solr's response but > this is obviously not ideal. I would feel better knowing what is causing the > issue as well. > > -Sangraal > > > > On 7/29/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > > > > > : Sure, the method that does all the work updating Solr is the > > doUpdate(String > > : s) method in the GanjaUpdate class I'm pasting below. It's hanging when > > I > > : try to read the response... the last output I receive in my log is Got > > : Reader... > > > > I don't have the means to try out this code right now ... but i can't see > > any obvious problems with it (there may be somewhere that you are opening > > a stream or reader and not closing it, but i didn't see one) ... i notice > > you are running this client on the same machine as Solr (hence the > > localhost URLs) did you by any chance try running the client on a seperate > > machine to see if hte number of updates before it hangs changes? > > > > my money is still on a filehandle resource limit somwhere ... if you are > > running on a system that has "lsof" (on some Unix/Linux installations you > > need sudo/su root permissions to run it) you can use "lsof -p ####" to > > look up what files/network connections are open for a given process. You > > can try running that on both the client pid and the Solr server pid once > > it's hung -- You'll probably see a lot of Jar files in use for both, but > > if you see more then a few XML files open by the client, or more then a > > 1 TCP connection open by either the client or the server, there's your > > culprit. > > > > I'm not sure what Windows equivilent of lsof may exist. > > > > Wait ... i just had another thought.... > > > > You are using InputStreamReader to deal with the InputStreams of your > > remote XML files -- but you aren't specifying a charset, so it's using > > your system default which may be differnet from the charset of the > > orriginal XML files you are pulling from the URL -- which (i *think*) > > means that your InputStreamReader may in some cases fail to read all of > > the bytes of the stream, which might some dangling filehandles (i'm just > > guessing on that part ... i'm not acctually sure whta happens in that > > case). > > > > What if you simplify your code (for the purposes of testing) and just put > > the post-transform version ganja-full.xml in a big ass String variable in > > your java app and just call GanjaUpdate.doUpdate(bigAssString) over and > > over again ... does that cause the same problem? > > > > > > : > > : ---------- > > : > > : package com.iceninetech.solr.update; > > : > > : import com.iceninetech.xml.XMLTransformer; > > : > > : import java.io.*; > > : import java.net.HttpURLConnection; > > : import java.net.URL; > > : import java.util.logging.Logger; > > : > > : public class GanjaUpdate { > > : > > : private String updateSite = ""; > > : private String XSL_URL = "http://localhost:8080/xsl/ganja.xsl"; > > : > > : private static final File xmlStorageDir = new > > : File("/source/solr/xml-dls/"); > > : > > : final Logger log = Logger.getLogger(GanjaUpdate.class.getName()); > > : > > : public GanjaUpdate(String siteName) { > > : this.updateSite = siteName; > > : log.info("GanjaUpdate is primed and ready to update " + siteName); > > : } > > : > > : public void update() { > > : StringWriter sw = new StringWriter(); > > : > > : try { > > : // transform gawkerInput XML to SOLR update XML > > : XMLTransformer transform = new XMLTransformer(); > > : log.info("About to transform ganjaInput XML to Solr Update XML"); > > : transform.transform(getXML(), sw, getXSL()); > > : log.info("Completed ganjaInput/SolrUpdate XML transform"); > > : > > : // Write transformed XML to Disk. > > : File transformedXML = new File(xmlStorageDir, updateSite+".sml"); > > : FileWriter fw = new FileWriter(transformedXML); > > : fw.write(sw.toString()); > > : fw.close(); > > : > > : // post to Solr > > : log.info("About to update Solr for site " + updateSite); > > : String result = this.doUpdate(sw.toString()); > > : log.info("Solr says: " + result); > > : sw.close(); > > : } catch (Exception e) { > > : e.printStackTrace(); > > : } > > : } > > : > > : public File getXML() { > > : String XML_URL = "http://localhost:8080/" + updateSite + "/ganja- > > : full.xml"; > > : > > : // check for file > > : File localXML = new File(xmlStorageDir, updateSite + ".xml"); > > : > > : try { > > : if (localXML.createNewFile() && localXML.canWrite()) { > > : // open connection > > : log.info("Downloading: " + XML_URL); > > : URL url = new URL(XML_URL); > > : HttpURLConnection conn = (HttpURLConnection) url.openConnection > > (); > > : conn.setRequestMethod("GET"); > > : > > : // Read response to File > > : log.info("Storing XML to File" + localXML.getCanonicalPath ()); > > : FileOutputStream fos = new FileOutputStream(new > > File(xmlStorageDir, > > : updateSite + ".xml")); > > : > > : BufferedReader rd = new BufferedReader(new InputStreamReader( > > : conn.getInputStream())); > > : String line; > > : while ((line = rd.readLine()) != null) { > > : line = line + '\n'; // add break after each line. It preserves > > : formatting. > > : fos.write(line.getBytes("UTF8")); > > : } > > : > > : // close connections > > : rd.close(); > > : fos.close(); > > : conn.disconnect(); > > : log.info("Got the XML... File saved."); > > : } > > : } catch (Exception e) { > > : e.printStackTrace(); > > : } > > : > > : return localXML; > > : } > > : > > : public File getXSL() { > > : StringBuffer retVal = new StringBuffer(); > > : > > : // check for file > > : File localXSL = new File(xmlStorageDir, "ganja.xsl"); > > : > > : try { > > : if (localXSL.createNewFile() && localXSL.canWrite()) { > > : // open connection > > : log.info("Downloading: " + XSL_URL); > > : URL url = new URL(XSL_URL); > > : HttpURLConnection conn = (HttpURLConnection) url.openConnection > > (); > > : conn.setRequestMethod("GET"); > > : // Read response > > : BufferedReader rd = new BufferedReader(new InputStreamReader( > > : conn.getInputStream())); > > : String line; > > : while ((line = rd.readLine()) != null) { > > : line = line + '\n'; > > : retVal.append(line); > > : } > > : // close connections > > : rd.close(); > > : conn.disconnect(); > > : > > : log.info("Got the XSLT."); > > : > > : // output file > > : log.info("Storing XSL to File" + localXSL.getCanonicalPath ()); > > : FileOutputStream fos = new FileOutputStream(new > > File(xmlStorageDir, > > : "ganja.xsl")); > > : fos.write(retVal.toString().getBytes()); > > : fos.close(); > > : log.info("File saved."); > > : } > > : } catch (Exception e) { > > : e.printStackTrace(); > > : } > > : return localXSL; > > : } > > : > > : private String doUpdate(String sw) { > > : StringBuffer updateResult = new StringBuffer(); > > : try { > > : // open connection > > : log.info("Connecting to and preparing to post to SolrUpdate > > : servlet."); > > : URL url = new URL("http://localhost:8080/update"); > > : HttpURLConnection conn = (HttpURLConnection) url.openConnection(); > > : conn.setRequestMethod("POST"); > > : conn.setRequestProperty("Content-Type", > > "application/octet-stream"); > > : conn.setDoOutput(true); > > : conn.setDoInput(true); > > : conn.setUseCaches(false); > > : > > : // Write to server > > : log.info("About to post to SolrUpdate servlet."); > > : DataOutputStream output = new DataOutputStream( > > conn.getOutputStream > > : ()); > > : output.writeBytes(sw); > > : output.flush(); > > : output.close(); > > : log.info("Finished posting to SolrUpdate servlet."); > > : > > : // Read response > > : log.info("Ready to read response."); > > : BufferedReader rd = new BufferedReader(new InputStreamReader( > > : conn.getInputStream())); > > : log.info("Got reader...."); > > : String line; > > : while ((line = rd.readLine()) != null) { > > : log.info("Writing to result..."); > > : updateResult.append(line); > > : } > > : rd.close(); > > : > > : // close connections > > : conn.disconnect(); > > : > > : log.info("Done updating Solr for site" + updateSite); > > : } catch (Exception e) { > > : e.printStackTrace(); > > : } > > : > > : return updateResult.toString(); > > : } > > : } > > : > > : > > : On 7/28/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: > > : > > > : > > > : > : I'm sure... it seems like solr is having trouble writing to a tomcat > > : > : response that's been inactive for a bit. It's only 30 seconds > > though, so > > : > I'm > > : > : not entirely sure why that would happen. > > : > > > : > but didn't you say you don't have this problem when you use curl -- > > just > > : > your java client code? > > : > > > : > Did you try Yonik's python test client? or the java client in Jira? > > : > > > : > looking over the java clinet codey you sent, it's not clear if you are > > : > reading the response back, or closing the connections ... can you post > > a > > : > more complete sample app thatexhibits the problem for you? > > : > > > : > > > : > > > : > -Hoss > > : > > > : > > > : > > > > > > > > -Hoss > > > > > >