Hello Grant, > >Haven't forgotten about you, but I've been traveling and then into >some US Holidays here. Happy thanks giving!
> >To confirm I am understanding, you are seeing a slowdown between 1.3- >dev from April and one from September, right? Yep. Here are the MD5 hashes:- fergus: md5 *.war MD5 (solr-bc.war) = 8d4f95628d6978c959d63d304788bc25 MD5 (solr-nightly.war) = 10281455a66b0035ee1f805496d880da This is the META-INF/MANIFEST.MF from a recent nightly build. (slow) Manifest-Version: 1.0 Ant-Version: Apache Ant 1.7.0 Created-By: 1.5.0_06-b05 (Sun Microsystems Inc.) Extension-Name: org.apache.solr Specification-Title: Apache Solr Search Server Specification-Version: 1.3.0.2008.11.13.08.16.12 Specification-Vendor: The Apache Software Foundation Implementation-Title: org.apache.solr Implementation-Version: nightly exported - yonik - 2008-11-13 08:16:12 Implementation-Vendor: The Apache Software Foundation X-Compile-Source-JDK: 1.5 X-Compile-Target-JDK: 1.5 This is war file we were given on the course Manifest-Version: 1.0 Ant-Version: Apache Ant 1.7.0 Created-By: 1.5.0_13-121 ("Apple Computer, Inc.") Extension-Name: org.apache.solr Specification-Title: Apache Solr Search Server Specification-Version: 1.2.2008.04.04.08.09.14 Specification-Vendor: The Apache Software Foundation Implementation-Title: org.apache.solr Implementation-Version: 1.3-dev exported - erik - 2008-04-04 08:09:14 Implementation-Vendor: The Apache Software Foundation X-Compile-Source-JDK: 1.5 X-Compile-Target-JDK: 1.5 I have copied both war files to a web site http://www.twig.me.uk/solr/solr-bc.war (solr 1.3 dev == bootcamp) http://www.twig.me.uk/solr/solr-nightly.war (nightly) Regards Fergus. >Can you produce an MD5 hash of the WAR file or something, such that I >can know I have the exact bits. Better yet, perhaps you can put those >files up somewhere where they can be downloaded. > >Thanks, >Grant > >On Nov 26, 2008, at 10:54 AM, Fergus McMenemie wrote: > >> Hello Grant, >> >> Not much good with Java profilers (yet!) so I thought I >> would send a script! >> >> Details... details! Having decided to produce a script to >> replicate the 1.2 vis 1.3 speed problem. The required rigor >> revealed a lot more. >> >> 1) The faster version I have previously referred to as 1.2, >> was actually a "1.3-dev" I had downloaded as part of the >> solr bootcamp class at ApacheCon Europe 2008. The ID >> string in the CHANGES.txt document is:- >> $Id: CHANGES.txt 643465 2008-04-01 16:10:19Z gsingers $ >> >> 2) I did actually download and speed test a version of 1.2 >> from the internet. It's CHANGES.txt id is:- >> $Id: CHANGES.txt 543263 2007-05-31 21:19:02Z yonik $ >> Speed wise it was about the same as 1.3 at 64min. It also >> had lots of char set issues and is ignored from now on. >> >> 3) The version I was planning to use, till I found this, >> speed issue was the "latest" official version:- >> $Id: CHANGES.txt 694377 2008-09-11 17:40:11Z klaas $ >> I also verified the behavior with a nightly build. >> $Id: CHANGES.txt 712457 2008-11-09 01:24:11Z koji $ >> >> Anyway, The following script indexes the content in 22min >> for the 1.3-dev version and takes 68min for the newer releases >> of 1.3. I took the conf directory from the 1.3dev (bootcamp) >> release and used it replace the conf directory from the >> official 1.3 release. The 3x slow down was still there; it is >> not a configuration issue! >> ================================= >> >> >> >> >> >> >> #! /bin/bash >> >> # This script assumes a /usr/local/tomcat link to whatever version >> # of tomcat you have installed. I have "apache-tomcat-5.5.20" Also >> # /usr/local/tomcat/conf/Catalina/localhost contains no solr.xml. >> # All the following was done as root. >> >> >> # I have a directory /usr/local/ts which contains four versions of >> solr. The >> # "official" 1.2 along with two 1.3 releases and a version of 1.2 or >> a 1.3beata >> # I got while attending a solr bootcamp. I indexed the same content >> using the >> # different versions of solr as follows: >> cd /usr/local/ts >> if [ "" ] >> then >> echo "Starting from a-fresh" >> sleep 5 # allow time for me to interrupt! >> cp -Rp apache-solr-bc/example/solr ./solrbc #bc = bootcamp >> cp -Rp apache-solr-nightly/example/solr ./solrnightly >> cp -Rp apache-solr-1.3.0/example/solr ./solr13 >> >> # the gaz is regularly updated and its name keeps changing :-) The >> page >> # http://earth-info.nga.mil/gns/html/namefiles.htm has a link to >> the latest >> # version. >> curl "http://earth-info.nga.mil/gns/html/geonames_dd_dms_date_20081118.zip >> " > geonames.zip >> unzip -q geonames.zip >> # delete corrupt blips! >> perl -i -n -e 'print unless >> ($. > 2128495 and $. < 2128505) or >> ($. > 5944254 and $. < 5944260) >> ;' geonames_dd_dms_date_20081118.txt >> #following was used to detect bad short records >> #perl -a -F\\t -n -e ' print "line $. is bad with ",scalar(@F)," >> args\n" if (@F != 26);' geonames_dd_dms_date_20081118.txt >> >> # my set of fields and copyfields for the schema.xml >> fields=' >> <fields> >> <field name="UNI" type="string" indexed="true" >> stored="true" required="true" /> >> <field name="CCODE" type="string" indexed="true" >> stored="true"/> >> <field name="DSG" type="string" indexed="true" >> stored="true"/> >> <field name="CC1" type="string" indexed="true" >> stored="true"/> >> <field name="LAT" type="sfloat" indexed="true" >> stored="true"/> >> <field name="LONG" type="sfloat" indexed="true" >> stored="true"/> >> <field name="MGRS" type="string" indexed="false" >> stored="true"/> >> <field name="JOG" type="string" indexed="false" >> stored="true"/> >> <field name="FULL_NAME" type="string" indexed="true" >> stored="true"/> >> <field name="FULL_NAME_ND" type="string" indexed="true" >> stored="true"/> >> <!--field name="text" type="text" indexed="true" >> stored="false" multiValued="true"/ --> >> <!--field name="timestamp" type="date" indexed="true" >> stored="true" default="NOW" multiValued="false"/--> >> ' >> copyfields=' >> </fields> >> <copyField source="FULL_NAME" dest="text"/> >> <copyField source="FULL_NAME_ND" dest="text"/> >> ' >> >> # add in my fields and copyfields >> perl -i -p -e "print qq($fields) if s/<fields>//;" solr*/ >> conf/schema.xml >> perl -i -p -e "print qq($copyfields) if s[</fields>][];" solr*/ >> conf/schema.xml >> # change the unique key and mark the "id" field as not required >> perl -i -p -e "s/<uniqueKey>id/<uniqueKey>UNI/i;" solr*/ >> conf/schema.xml >> perl -i -p -e 's/required="true"//i if m/<field name="id"/;' solr*/ >> conf/schema.xml >> # enable remote streaming in solrconfig file >> perl -i -p -e 's/enableRemoteStreaming="false"/ >> enableRemoteStreaming="true"/;' solr*/conf/solrconfig.xml >> fi >> >> # some constants to keep the curl command shorter >> skip >> = >> "MODIFY_DATE >> ,RC >> ,UFI >> ,DMS_LAT >> ,DMS_LONG >> ,FC,PC,ADM1,ADM2,POP,ELEV,CC2,NT,LC,SHORT_FORM,GENERIC,SORT_NAME" >> file=`pwd`"/geonames.txt" >> >> export JAVA_OPTS=" -Xmx512M -Xms512M -Dsolr.home=`pwd`/solr - >> Dsolr.solr.home=`pwd`/solr" >> >> echo 'Getting ready to index the data set using solrbc (bc = >> bootcamp)' >> /usr/local/tomcat/bin/shutdown.sh >> sleep 15 >> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ] >> then >> echo "Tomcat would not shutdown" >> exit >> fi >> rm -r /usr/local/tomcat/webapps/solr* >> rm -r /usr/local/tomcat/logs/*.out >> rm -r /usr/local/tomcat/work/Catalina/localhost/solr >> cp apache-solr-bc/example/webapps/solr.war /usr/local/tomcat/webapps >> rm solr # rm the symbolic link >> ln -s solrbc solr >> rm -r solr/data >> /usr/local/tomcat/bin/startup.sh >> sleep 10 # give solr time to launch and setup >> echo "Starting indexing at " `date` " with solrbc (bc = bootcamp)" >> time curl >> "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip >> >> " >> >> echo "Getting ready to index the data set using solrnightly" >> /usr/local/tomcat/bin/shutdown.sh >> sleep 15 >> if [ -n "`ps awxww | grep tomcat | grep -v grep`" ] >> then >> echo "Tomcat would not shutdown" >> exit >> fi >> rm -r /usr/local/tomcat/webapps/solr* >> rm -r /usr/local/tomcat/logs/*.out >> rm -r /usr/local/tomcat/work/Catalina/localhost/solr >> cp apache-solr-nightly/example/webapps/solr.war /usr/local/tomcat/ >> webapps >> rm solr # rm the symbolic link >> ln -s solrnightly solr >> rm -r solr/data >> /usr/local/tomcat/bin/startup.sh >> sleep 10 # give solr time to launch and setup >> echo "Starting indexing at " `date` " with solrnightly" >> time curl >> "http://localhost:8080/solr/update/csv?commit=true&stream.file=$file&escape=%00&separator=%09&skip=$skip >> >> " >> >> >> >> >>> On Nov 20, 2008, at 9:18 AM, Fergus McMenemie wrote: >>> >>>> Hello Grant, >>>> >>>>> Were you overwriting the existing index or did you also clean out >>>>> the >>>>> Solr data directory, too? In other words, was it a fresh index, or >>>>> an >>>>> existing one? And was that also the case for the 22 minute time? >>>> >>>> No in each case it was a new index. I store the indexes (the "data" >>>> dir) >>>> outside the solr home directory. For the moment I, rm -rf the index >>>> dir >>>> after each edit to the solrconfig.sml or schema.xml file and reindex >>>> from scratch. The relaunch of tomcat recreates the index dir. >>>> >>>>> Would it be possible to profile the two instance and see if you >>>>> notice >>>>> anything different? >>>> I dont understand this. Do mean run a profiler against the tomcat >>>> image as indexing takes place, or somehow compare the indexes? >>> >>> Something like JProfiler or any other Java profiler. >>> >>>> >>>> >>>> I was think of making a short script that replicates the results, >>>> and posting it here, would that help? >>> >>> >>> Very much so. >>> >>> >>>> >>>> >>>>> >>>>> Thanks, >>>>> Grant >>>>> >>>>> On Nov 19, 2008, at 8:25 AM, Fergus McMenemie wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> I have a CSV file with 6M records which took 22min to index with >>>>>> solr 1.2. I then stopped tomcat replaced the solr stuff inside >>>>>> webapps with version 1.3, wiped my index and restarted tomcat. >>>>>> >>>>>> Indexing the exact same content now takes 69min. My machine has >>>>>> 2GB of RAM and tomcat is running with $JAVA_OPTS -Xmx512M - >>>>>> Xms512M. >>>>>> >>>>>> Are there any tweaks I can use to get the original index time >>>>>> back. I read through the release notes and was expecting a >>>>>> speed up. I saw the bit about increasing ramBufferSizeMB and set >>>>>> it to 64MB; it had no effect. >>>>>> -- >> >> -- >> >> =============================================================== >> Fergus McMenemie Email:[EMAIL PROTECTED] >> Techmore Ltd Phone:(UK) 07721 376021 >> >> Unix/Mac/Intranets Analyst Programmer >> =============================================================== > >-------------------------- >Grant Ingersoll > >Lucene Helpful Hints: >http://wiki.apache.org/lucene-java/BasicsOfPerformance >http://wiki.apache.org/lucene-java/LuceneFAQ > > > > > > > > > > > ></x-flowed> -- =============================================================== Fergus McMenemie Email:[EMAIL PROTECTED] Techmore Ltd Phone:(UK) 07721 376021 Unix/Mac/Intranets Analyst Programmer ===============================================================