No need to run a separate web server. I actually do HTTP updates from an extra servlet configured into the Solr webserver. It might seem a little odd, but same-system TCP sockets are extremely fast and low overhead.
The additional flexibility is nice, too. If I find a bug in the indexing code in production, I can fix it locally and update from the fixed copy over HTTP while I wait for a push of code to production. Modern HTTP and TCP are very fast and very reliable, so don't count out the HTTP/XML interface before trying it. wunder == Search Guy Netflix On 8/27/07 9:18 PM, "climbingrose" <[EMAIL PROTECTED]> wrote: > Agree. I was actually thinking of developing the embedded version early this > year for one of my projects. I'm sure it will be needed in cases where > running another web server is an overkill. > > On 8/28/07, Jonathan Woods <[EMAIL PROTECTED]> wrote: >> >> I don't think you should apologise for highlighting embedded usage. For >> circumstances in which you're at liberty to run a Solr instance in the >> same >> JVM as an app which uses it, I find it very strange that you should have >> to >> use anything _other_ than embedded, and jump through all the unnecessary >> hoops (XML conversion, HTTP transport) that this implies. It's a bit like >> suggesting you should throw away Java method invocations altogether, and >> write everything in XML-RPC. >> >> Bit of a pet issue of mine! I'll be creating a JIRA issue on the subject >> soon. >> >> Jon >> >>> -----Original Message----- >>> From: Sundling, Paul [mailto:[EMAIL PROTECTED] >>> Sent: 28 August 2007 03:24 >>> To: solr-user@lucene.apache.org >>> Subject: RE: Embedded about 50% faster for indexing >>> >>> At this point I think I'm going recommend against embedded, >>> regardless of any performance advantage. The level of >>> documentation is just too low, while the XML API is clearly >>> documented. It's clear that XML is preferred. >>> >>> The embedded example on the wiki is pretty good, but until >>> mutliple core support comes out in the next version, you have >>> to use multiple SolrCore. If they are accessed in the same >>> webapp, then you can't just set JNDI (since you can only have >>> one value). So you have to use a Config object as alluded to >>> in the example. However, you look at the code and there is >>> no javadoc for the constructor. The constructor args are >>> (String name, InputStream is, String prefix). I think name >>> is a unique name for the solr core, but that is a guess. >>> Inputstream may be a stream to the solr home, but it could be >>> anything. Prefix may be a URI prefix. These are all guesses >>> without trying to read through the code. >>> >>> When I look at SolrCore, it looks like it's a singleton, so >>> maybe I can't even access more than one SolrCore using >>> embedded anyway. :( So I apologize for highlighting Embedded. >>> >>> Anyway it's clear how to do multiple solr cores using XML. >>> You just have different post URI for the difference cores. >>> You can easily inject that with Spring and externalize the >>> config. Simple and easy. So I concede XML is the way to go. :) >>> >>> Paul Sundling >>> >>> -----Original Message----- >>> From: Mike Klaas [mailto:[EMAIL PROTECTED] >>> Sent: Monday, August 27, 2007 5:50 PM >>> To: solr-user@lucene.apache.org >>> Subject: Re: Embedded about 50% faster for indexing >>> >>> >>> On 27-Aug-07, at 12:44 PM, Sundling, Paul wrote: >>> >>>> Whether embedded solr should give me a performance boost or not, it >>>> did. >>>> :) I'm not surprised, since it skips XML parsing. >>> Although you never >>>> know where cycles are used for sure until you profile. >>> >>> It certainly is possible that XML parsing dwarfs indexing, but I'd >>> expect that only to occur under very light analysis and field >>> storage >>> workloads. >>> >>>> I tried doing more records per post (200) and it was >>> actually slightly >>> >>>> slower and seemed to require more memory. This makes sense because >>>> you >>>> have to take up more memory for the StringBuilder to store the much >>>> larger XML. For 10,000 it was much slower. For that size I would >>>> need >>>> to XML streaming or something to make it work. >>>> >>>> The solr war was on the same machine, so network overhead was only >>>> from >>>> using loopback. >>> >>> The big question is still your connection handling strategy: >>> are you >>> using persistent http connections? Are you threadedly indexing? >>> >>> cheers, >>> -Mike >>> >>>> Paul Sundling >>>> >>>> -----Original Message----- >>>> From: climbingrose [mailto:[EMAIL PROTECTED] >>>> Sent: Monday, August 27, 2007 12:22 AM >>>> To: solr-user@lucene.apache.org >>>> Subject: Re: Embedded about 50% faster for indexing >>>> >>>> >>>> Haven't tried the embedded server but I think I have to agree with >>>> Mike. >>>> We're currently sending 2000 job batches to SOLR server and >>> the amount >>>> of time required to transfer documents over http is insignificant >>>> compared with the time required to index them. So I do >>> think unless >>>> you >>>> are sending document one by one, embedded SOLR shouldn't >>> give you much >>>> more performance boost. >>>> >>>> On 8/25/07, Mike Klaas <[EMAIL PROTECTED]> wrote: >>>>> >>>>> On 24-Aug-07, at 2:29 PM, Wu, Daniel wrote: >>>>> >>>>>>> -----Original Message----- >>>>>>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of >>>>>>> Yonik Seeley >>>>>>> Sent: Friday, August 24, 2007 2:07 PM >>>>>>> To: solr-user@lucene.apache.org >>>>>>> Subject: Re: Embedded about 50% faster for indexing >>>>>>> >>>>>>> One thing I'd like to avoid is everyone trying to embed just for >>>>>>> performance gains. If there is really that much >>> difference, then we >>>> >>>>>>> need a better way for people to get that without >>> resorting to Java >>>>>>> code. >>>>>>> >>>>>>> -Yonik >>>>>>> >>>>>> >>>>>> Theoretically and practically, embedded solution will be >>> faster than >>>> >>>>>> going through http/xml. >>>>> >>>>> This is only true if the http interface adds significant >>> overhead to >>>>> the cost of indexing a document, and I don't see why this >>> should be >>>>> so, as indexing is relatively heavyweight. setting up the >>> connection >>> >>>>> could be expensive, but this can be greatly mitigated by >>> sending more >>> >>>>> than one doc per http request, using persistent connections, and >>>>> threading. >>>>> >>>>> -Mike >>>>> >>>> >>>> >>>> >>>> -- >>>> Regards, >>>> >>>> Cuong Hoang >>> >>> >>> >>> >>> >> >> >