Thanks Adam! Dennis Gearon Signature Warning ---------------- It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
EARTH has a Right To Life, otherwise we all die. --- On Thu, 12/16/10, Adam Estrada <estrada.a...@gmail.com> wrote: > From: Adam Estrada <estrada.a...@gmail.com> > Subject: Re: bulk commits > To: solr-user@lucene.apache.org > Date: Thursday, December 16, 2010, 6:18 PM > One very important thing I forgot to > mention is that you will have to > increase the JAVA heap size for larger data sets. > > Set JAVA_OPT to something acceptable. > > Adam > > On Thu, Dec 16, 2010 at 3:27 PM, Yonik Seeley > <yo...@lucidimagination.com>wrote: > > > On Thu, Dec 16, 2010 at 3:06 PM, Dennis Gearon <gear...@sbcglobal.net> > > wrote: > > > That easy, huh? Heck, this gets better and > better. > > > > > > BTW, how about escaping? > > > > The CSV escaping? It's configurable to allow for > loading different > > CSV dialects. > > > > http://wiki.apache.org/solr/UpdateCSV > > > > By default it uses double quote encapsulation, like > excel would. > > The bottom of the wiki page shows how to configure tab > separators and > > backslash escaping like MySQL produces by default. > > > > -Yonik > > http://www.lucidimagination.com > > > > > > > > > > Dennis Gearon > > > > > > > > > Signature Warning > > > ---------------- > > > It is always a good idea to learn from your own > mistakes. It is usually a > > better > > > idea to learn from others’ mistakes, so you do > not have to make them > > yourself. > > > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > > > > > > > > EARTH has a Right To Life, > > > otherwise we all die. > > > > > > > > > > > > ----- Original Message ---- > > > From: Adam Estrada <estrada.adam.gro...@gmail.com> > > > To: Dennis Gearon <gear...@sbcglobal.net>; > solr-user@lucene.apache.org > > > Sent: Thu, December 16, 2010 10:58:47 AM > > > Subject: Re: bulk commits > > > > > > This is how I import a lot of data from a cvs > file. There are close to > > 100k > > > records in there. Note that you can either > pre-define the column names > > using > > > the fieldnames param like I did here *or* include > header=true which will > > > automatically pick up the column header if your > file has it. > > > > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,cat&stream.file=C > > > > > > > > > :\tmp\cities1000.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > > > > This seems to load everything in to some kind of > temporary location > > before > > > it's actually committed. If something goes wrong > there is a rollback > > feature > > > that will undo anything that happened before the > commit. > > > > > > As far as batching a bunch of files, I copied and > pasted the following in > > to > > > Cygwin and it worked just fine. > > > > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,lat,lng,countrycode,population,elevation,gtopo30,timezone,modificationdate,cat&stream.file=C > > > > > > > > > :\tmp\cities1000.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xab.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xac.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xad.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xae.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xaf.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xag.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xah.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xai.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xaj.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xak.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xal.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xam.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xan.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xao.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl " > > > > > http://localhost:8983/solr/update/csv?commit=true&separator=%2C&fieldnames=id,name,asciiname,latitude,longitude,featureclass,featurecode,countrycode,admin1code,admin2code,admin3code,admin4code,population,elevation,gtopo30,timezone,modificationdate&stream.file=C > > > > > > > :\tmp\xap.csv&overwrite=true&stream.contentType=text/plain;charset=utf-8" > > > curl http://localhost:8983/solr/update -H > "Content-Type: text/xml" > > > --data-binary '<optimize/>' > > > > > > Adam > > > > > > On Thu, Dec 16, 2010 at 1:44 PM, Dennis Gearon > <gear...@sbcglobal.net > > >wrote: > > > > > >> Might be Csv or tab delimited text. > > >> > > >> Sent from Yahoo! Mail on Android > > >> > > >> ------------------------------ > > >> * From: * Adam Estrada <estrada.adam.gro...@gmail.com>; > > >> * To: * <solr-user@lucene.apache.org>; > > >> * Subject: * Re: bulk commits > > >> * Sent: * Thu, Dec 16, 2010 6:35:17 PM > > >> > > >> what is it that you are > trying to commit? > > >> > > >> a > > >> > > >> On Thu, Dec 16, 2010 at 1:03 PM, Dennis > Gearon <gear...@sbcglobal.net > > >> >wrote: > > >> > > >> > What have people found as the best way > to do bulk commits either from > > the > > >> > web or > > >> > from a file on the system? > > >> > > > >> > Dennis Gearon > > >> > > > >> > > > >> > Signature Warning > > >> > ---------------- > > >> > It is always a good idea to learn from > your own mistakes. It is > > usually a > > >> > better > > >> > idea to learn from others’ mistakes, > so you do not have to make them > > >> > yourself. > > >> > from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036' > > >> > > > >> > > > >> > EARTH has a Right To Life, > > >> > otherwise we all die. > > >> > > > >> > > > >> > > > > > > > > >