RE: Questions about Solr's security

2011-11-03 Thread Jaeger, Jay - DOT
It seems to me that this issue needs to be addressed in the FAQ and in the tutorial, and that somewhere there should be a /select lock-down "how to". This is not obvious to many (most?) users of Solr. It certainly wasn't obvious to me before I read this. JRJ -Original Message- From:

RE: change solr url

2011-11-03 Thread Jaeger, Jay - DOT
The file that he refers to, web.xml, is inside the solr WAR file in folder web-inf. That WAR file is in ...\example\webapps. You would have to uncomment the section under and change the to something else. But, as the comments in the section explain, you would also have to make other cha

RE: large scale indexing issues / single threaded bottleneck

2011-11-03 Thread Jaeger, Jay - DOT
Shishir, we have 35 million "documents", and should be doing about 5000-1 new "documents" a day, but with very small "documents": 40 fields which have at most a few terms, with many being single terms. You may occasionally see some impact from top level index merges but those should be

RE: Difficulties Installing Solr with Jetty 7.x

2011-10-27 Thread Jaeger, Jay - DOT
ce to do Solr-specific configuration apart from $JETTY_HOME/etc/, and my intuition is that these files shouldn't be messed with unless the intention is to affect global container-wide behavior. Which I don't. I'm only trying to get Solr running. I may want to run other apps, so I'd

RE: Upgratding the Index from 1.4.1 to 3.4 using replication

2011-10-26 Thread Jaeger, Jay - DOT
I very much doubt that would work: different versions of Lucene involved, and Solr replication does just a streamed file copy, nothing fancy. JRJ -Original Message- From: Nemani, Raj [mailto:raj.nem...@turner.com] Sent: Wednesday, October 26, 2011 12:55 PM To: solr-user@lucene.apache.o

RE: Difficulties Installing Solr with Jetty 7.x

2011-10-26 Thread Jaeger, Jay - DOT
ERRATA, that should the the *SOLR* web.xml (not the Jetty web.xml) Sorry for the confusion. -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Wednesday, October 26, 2011 4:02 PM To: 'solr-user@lucene.apache.org' Subject: RE: Difficulties Installing

RE: Difficulties Installing Solr with Jetty 7.x

2011-10-26 Thread Jaeger, Jay - DOT
>From your logs, it looks like the Solr library is being found just fine, and >that the servlet is initing OK. Does your Jetty configuration specify index.jsp in a welcome list? We had that problem in WebSphere: we got 404's the same way, and the cure was to modify the Jetty web.xml to include

RE: some basic information on Solr

2011-10-26 Thread Jaeger, Jay - DOT
It didn't look like that, but maybe. Our experience has been very very good. I don't think we have seen a crash in our prototype to date (though that prototype is also not very busy). We have had as many a four cores, with as many as 35 million "documents". -Original Message- From

RE: Loading data to SOLR first time ( taking too long)

2011-10-26 Thread Jaeger, Jay - DOT
- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Tuesday, October 25, 2011 4:03 PM To: 'solr-user@lucene.apache.org' Subject: RE: Loading data to SOLR first time ( taking too long) My goodness. We do 4 million in about 1/2 HOUR (7+ million in 40 minutes). First ques

RE: Replication issues with multiple Slaves

2011-10-26 Thread Jaeger, Jay - DOT
download them. By keeping older commits we were able to work around this issue. > > -Original Message- > From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] > Sent: 25 October 2011 20:48 > To: solr-user@lucene.apache.org > Subject: RE: Replication issues with multi

RE: Loading data to SOLR first time ( taking too long)

2011-10-25 Thread Jaeger, Jay - DOT
My goodness. We do 4 million in about 1/2 HOUR (7+ million in 40 minutes). First question: Are you somehow forcing Solr to do a commit for each and every record? If so, that way leads to the house of PAIN. The thing to do next, I suppose, might be to try and figure out whether the issue is i

RE: Replication issues with multiple Slaves

2011-10-25 Thread Jaeger, Jay - DOT
I noted that in these messages the left hand side is lower case collection, but the right hand side is upper case Collection. Assuming you did a cut/paste, could you have a core name mismatch between a master and a slave somehow? Otherwise (shudder): could you be doing a commit while the repli

RE: Points to processing hastags

2011-10-25 Thread Jaeger, Jay - DOT
Sounds like a possible application of solr.PatternTokenizerFactory http://lucene.apache.org/solr/api/org/apache/solr/analysis/PatternTokenizerFactory.html You could use copyField to copy the entire string to a separate field (or set of fields) that are processed by patterns. JRJ -Origina

RE: sort non-roman character strings last

2011-10-25 Thread Jaeger, Jay - DOT
As far as I know, in the index, a string that is zero length is still a string, and would not count as "missing". The CSV importer has a way to not index empty entries, but once it is in the index, it is in the index -- as an empty string. i.e. String silly = null; Is not the same thi

RE: sort non-roman character strings last

2011-10-25 Thread Jaeger, Jay - DOT
Could you replace it with something that will sort it last instead of an empty string? (Say, for example, replacement="{}"). This would still give something that looks empty to a person, and would sort last. BTW, it looks to me as though your pattern only requires that the input contain just

RE: some basic information on Solr

2011-10-25 Thread Jaeger, Jay - DOT
website but found it was really technical, since we are not on the developer side and we just want some basic information or numbers about its usage. Thanks for your answer, anyway. 2011/10/24 Jaeger, Jay - DOT > 1. Solr, proper, does not index "files". An adjunct called Solr Ce

RE: indexing key value pair into lucene solr index

2011-10-24 Thread Jaeger, Jay - DOT
Maybe put them in a single string field (or any other field type that is not analyzed -- certainly not text) using some character separator that will connect them, but won't confuse the Solr query parser? So maybe you start out with key value pairs of Key1 value1 Key2 value2 Key3 value3 Prepro

RE: some basic information on Solr

2011-10-24 Thread Jaeger, Jay - DOT
1. Solr, proper, does not index "files". An adjunct called Solr Cel can. See http://wiki.apache.org/solr/ExtractingRequestHandler . That article describes which kinds of files it Solr Cel can handle. 2. I have no idea what you mean by "incidents per year". Please explain. 3. Even though

RE: Optimization /Commit memory

2011-10-24 Thread Jaeger, Jay - DOT
; > On Thu, Oct 20, 2011 at 6:23 PM, Jaeger, Jay - DOT > wrote: > >> Well, since the OS RAM includes the JVM RAM, that is part of your >> requirement, yes? Aside from the JVM and normal OS requirements, all you >> need OS RAM for is file caching. Thus, for updates, the O

RE: OS Cache - Solr

2011-10-20 Thread Jaeger, Jay - DOT
Instances not solr cores. We get an avg response time of below 1 sec. The number of documents is not many most of the isntances ,some of the instnaces have about 5 lac documents on average. Regards Sujahta On Thu, Oct 20, 2011 at 3:35 AM, Jaeger, Jay - DOT wrote: > 200 instances of what? The S

RE: Optimization /Commit memory

2011-10-20 Thread Jaeger, Jay - DOT
Instances ,combined Index Size is 14GB .Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM is 14GB +3 * 2.5 GB ~ = 22GB. Correct? Regards Sujatha On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT wrote: > Commit does not particularly spike disk or memory usage, unless you a

RE: how was developed solr admin page and the UI part?

2011-10-20 Thread Jaeger, Jay - DOT
It certainly is possible to develop search pages, update pages, etc. in any architecture you like: I think I'd suggest looking at SolrJ if you want to do that.http://wiki.apache.org/solr/Solrj PLEASE: Go read through the documentation and tutorial and browse thru the Wiki and FAQ. It's a

RE: Optimization /Commit memory

2011-10-19 Thread Jaeger, Jay - DOT
Commit does not particularly spike disk or memory usage, unless you are adding a very large number of documents between commits. A commit can cause a need to merge indexes, which can increase disk space temporarily. An optimize is *likely* to merge indexes, which will usually increase disk spa

RE: add thumnail image for search result

2011-10-19 Thread Jaeger, Jay - DOT
It won't do it for you automatically. I suppose you might create the thumbnail image beforehand, Base64 encode it, and add it as a stored, non-indexed, binary field (see schema: solr.BinaryField) when you index the document. JRJ -Original Message- From: hadi [mailto:md.anb...@gmail.com

RE: How to update document with solrj?

2011-10-19 Thread Jaeger, Jay - DOT
Solr does not have an "update" per se: you have to re-add the document. A document with the same value for the field defined as the uniqueKey will replace any existing document with that key (you do not have to query and explicitly delete it first). JRJ -Original Message- From: hadi

RE: OS Cache - Solr

2011-10-19 Thread Jaeger, Jay - DOT
200 instances of what? The Solr application with lucene, etc. per usual? Solr cores? ??? Either way, 200 seems to be very very very many: unusually so. Why so many? If you have 200 instances of Solr in a 20 GB JVM, that would only be 100MB per Solr instance. If you have 200 instances of S

RE: how was developed solr admin page and the UI part?

2011-10-19 Thread Jaeger, Jay - DOT
I believe that if you have the Solr distribution, you have the source for the web UI already: it is just .jsp pages. They are inside the solr .war file. JRJ -Original Message- From: nagarjuna [mailto:nagarjuna.avul...@gmail.com] Sent: Wednesday, October 19, 2011 12:07 AM To: solr-user@

RE: How to retreive multiple documents using one unique field?

2011-10-18 Thread Jaeger, Jay - DOT
I do not believe that it will work as you have written it, unless you put an application in between to read that XML and then call Solr with what it expects. See http://wiki.apache.org/solr/UpdateXmlMessages You need to have: unique-value-if-any-1 abc 123 un

RE: Getting errors thrown from sun.nio.ch.FileDispatcher with native or simple or single lock .Please , i need help in resolving the issue.

2011-10-18 Thread Jaeger, Jay - DOT
As others have reported, I also did not get your image. I am interested in your situation because we will deploy to WAS 7 in production, and have tested there. One thing I noted that might point to a possible problem you might have: 1. "The owner of the files created in the 2 environment

RE: Xsl for query output

2011-10-17 Thread Jaeger, Jay - DOT
It depends upon whether you want Solr to do the XSL processing, or the browser. After fussing a bit, and doing some reading and thinking, we decided it was best to let the browser do the work, at least in our case. If the browser is doing the processing, you don't need to modify sorlconfig.xml

RE: Error loading class 'solr.extraction.ExtractingRequestHandler'

2011-10-17 Thread Jaeger, Jay - DOT
It sounds like maybe you either have not told Solr where the Solr home directory is, or , more likely, have not copied the jar files for this particular class into the right directory (typically a "lib" directory) so Tomcat cannot find that class. There is other correspondence on this list that

RE: Replication with an HA master

2011-10-13 Thread Jaeger, Jay - DOT
One thing to consider is the case where the JVM is up, but the system is otherwise unavailable (say, a NIC failure, firewall failure, load balancer failure) - especially if you use a SAN (whose connection is different from the normal network). In such a case the old master might have uncommitte

RE: capacity planning

2011-10-13 Thread Jaeger, Jay - DOT
We have used a VMWare VM for our index for testing for our index (currently around 3GB) and it has been just fine - at most maybe a 10 to 20% penalty, if that, even when CPU bound. We also plan to use a VM for production. What hypervisor one uses matters - sometimes a lot. -Original Messag

RE: Pls help :-) ! calling external ws/db to fetch field instead of own index?

2011-10-13 Thread Jaeger, Jay - DOT
Perhaps integrate this using a javascript or other application front end to query solr, get the key to the database, and then run off to get the data? -Original Message- From: Ikhsvaku S [mailto:ikhsv...@gmail.com] Sent: Tuesday, October 11, 2011 2:47 PM To: solr-user@lucene.apache.org

RE: what is the recommended way to store locations?

2011-10-06 Thread Jaeger, Jay - DOT
We do much the same (along with name, address, postal code, etc.). However, we use AND when we search: the more data someone can provide, the fewer and more applicable their search results. JRJ -Original Message- From: Jason Toy [mailto:jason...@gmail.com] Sent: Thursday, October 06, 2

RE: "Private" text fields

2011-10-06 Thread Jaeger, Jay - DOT
My thought about this, based on some work we did when we considered using Solr to index our LAN files: 1) If it matters - if someone misusing the private tags is a real issue (and it sounds like it would be), then I think you need an application out in front to enforce this (a good idea with So

RE: composite Unique Keys?

2011-10-05 Thread Jaeger, Jay - DOT
We generated our own concatenated key (original customer, who may historically have different addresses, etc.). If there is a way for Solr to do that automatigically, I'd love to hear about it. I don't think that the extra bytes for the key itself (String vs. binary integer) is all that much o

RE: Weird issues when upgrading from 1.4 to 3.4

2011-10-03 Thread Jaeger, Jay - DOT
I have no idea what might be causing your memory to increase like that (we haven't run 3.4, and our index so far has been at most 28 million rows with maybe 40 fields), but just as an aside, depending upon what you meant by "we drop the whole index", I'd think it might work better to do an righ

RE: Errors in requesthandler statistics

2011-09-29 Thread Jaeger, Jay - DOT
If you are asking how to tell which of 94000 records failed in a SINGLE HTTP update request, I have no idea, but I suspect that you cannot necessarily tell. It might help if you copied and pasted what you find in the solr log for the failure (see my previous response for how to figure out where

RE: Errors in requesthandler statistics

2011-09-29 Thread Jaeger, Jay - DOT
I am not expert, but based on my experience, the information you are looking for should indeed be in your logs. There are at least three logs you might look for / at: - An HTTP request log - The solr log - Logging by the application server / JVM Some information is available at http://wiki.apac

RE: About solr distributed search

2011-09-29 Thread Jaeger, Jay - DOT
I am no expert, but here is my take and our situation. Firstly, are you asking what the minimum number of documents is before it makes *any* sense at all to use a distributed search, or are you asking what the maximum number of documents is before a distributed search is essentially required?

RE: 32-bit to 64-bit

2011-09-29 Thread Jaeger, Jay - DOT
Are you changing just the host OS or the JVM, or both, from 32 bit to 64 bit? If it is just the OS, the answer is definitely no, you don't need to do anything more than copy. If the answer is the JVM, I *think* the answer is still no, but others more authoritative than I may wish to respond. -

RE: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Jaeger, Jay - DOT
cores adminPath="/admij/cores" Was that a cut and paste? If so, the /admij/cores is presumably incorrect, and ought to be /admin/cores -Original Message----- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Wednesday, September 28, 2011 4:10 PM To:

RE: Trouble configuring multicore / accessing admin page

2011-09-28 Thread Jaeger, Jay - DOT
One time when we had that problem, it was because one or more cores had a broken XML configuration file. Another time, it was because solr/home was not set right in the servlet container. Another time it was because we had an older EAR pointing to a newer release Solr home directory. Given wha

RE: strange performance issue with many shards on one server

2011-09-28 Thread Jaeger, Jay - DOT
one server Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto: > That would still show up as the CPU being busy. > i don't know how the program (top, htop, whatever) displays the value but when the cpu has a cache miss definitely that thread sits and waits for a number of clock cyc

RE: strange performance issue with many shards on one server

2011-09-28 Thread Jaeger, Jay - DOT
That would still show up as the CPU being busy. -Original Message- From: Federico Fissore [mailto:feder...@fissore.org] Sent: Wednesday, September 28, 2011 6:12 AM To: solr-user@lucene.apache.org Subject: Re: strange performance issue with many shards on one server Frederik Kraus, il 28

RE: A fieldType for a address street

2011-09-26 Thread Jaeger, Jay - DOT
We used copyField to copy the address to two fields: 1. Which contains just the first token up to the first whitespace 2. Which copies all of it, but translates to lower case. Then our users can enter either a street number, a street name, or both. We copied all of it to the second field bec

RE: SOLR Index Speed

2011-09-26 Thread Jaeger, Jay - DOT
500 / second would be 1,800,000 per hour (much more than 500K documents). 1) how big is each document? 2) how big are your index files? 3) as others have recently written, make sure you don't give your JRE so much memory that your OS is starved for memory to use for file system cache. JRJ --

RE: Replication and ExternalFileField

2011-09-15 Thread Jaeger, Jay - DOT
Actually, Windoze also has symbolic links. You have to manipulate them from the command line, but they do exist. http://en.wikipedia.org/wiki/NTFS_symbolic_link -Original Message- From: Per Osbeck [mailto:per.osb...@lbi.com] Sent: Thursday, September 15, 2011 7:15 AM To: solr-user@lu

RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
nd its too much. When i send a set of random queries (10-20 queries per second) response times goes crayz ( 8 seconds to 60+ seconds). On Wed, Sep 14, 2011 at 6:07 PM, Jaeger, Jay - DOT wrote: > I don't have enough experience with filter queries to advise well on when > to use fq vs. pu

RE: glassfish, solrconfig.xml and SolrException: Error loading DataImportHandler

2011-09-14 Thread Jaeger, Jay - DOT
Some things to think about: When solr starts up, solr should report for the location of solr home. Is it what you expect? Is there any security on the "dist" directory that would prevent solr from accessing it? Is there a classloader policy set on glassfish that could be getting in the way? (y

RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
o 6000m, particularly given your relatively modest number of documents (2,000,000). I was trying everything before asking here. 5. Machine characteristics, particularly operating system and physical memory on the machine. OS => Debian 6.0, Physcal Memory => 32 gb, CPU => 2x Intel Quad Cor

RE: Performance troubles with solr

2011-09-14 Thread Jaeger, Jay - DOT
I think folks are going to need a *lot* more information. Particularly 1. Just what does your "test script" do? Is it doing updates, or just queries of the sort you mentioned below? 2. If the test script is doing updates, how are those updates being fed to Solr? 3. What version of Solr

RE: EofException with Solr in Jetty

2011-09-14 Thread Jaeger, Jay - DOT
I have not used SolrJ, but it probably is worth considering as a possible suspect. Also, do you have anything in between the client and the Solr server (a firewall, load balancer, etc.?) that might play games with HTTP connections? You might want to start up a network trace on the server or net

RE: Schema fieldType y-m-d ?!?!

2011-09-14 Thread Jaeger, Jay - DOT
Just add a bogus 0 timestamp after it when you index it. That is what we did. Dates are not stored or indexed as characters, anyway, so space would not be any different one way or the other. JRJ -Original Message- From: stockii [mailto:stock.jo...@googlemail.com] Sent: Wednesday, Sep

RE: index not created

2011-09-14 Thread Jaeger, Jay - DOT
> changed the configuration to point it to my solr dir and started it again You might look in your logs to see where Solr thinks the Solr home directory is and/or if it complains about not being able to find it. As a guess, it can't find it, perhaps because solr.solr.home does not point to the

RE: EofException with Solr in Jetty

2011-09-14 Thread Jaeger, Jay - DOT
Looking at the source for Jetty, line 149 in Jetty's HttpOutput java file looks like this: if (_closed) throw new IOException("Closed"); < [http://www.jarvana.com/jarvana/view/org/eclipse/jetty/aggregate/jetty-all/7.1.0.RC0/jetty-all-7.1.0.RC0-sources.jar!/org/ec

RE: Out of memory

2011-09-13 Thread Jaeger, Jay - DOT
numDocs is not the number of documents in memory. It is the number of documents currently in the index (which is kept on disk). Same goes for maxDocs, except that it is a count of all of the documents that have ever been in the index since it was created or optimized (including deleted documen

RE: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Jaeger, Jay - DOT
Nicely put. ;^) -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Tuesday, September 13, 2011 9:16 AM To: solr-user@lucene.apache.org Subject: Re: can indexing information stored in db rather than filesystem? On Sep 13, 2011, at 6:51 AM, kiran.bodigam wrote:

RE: can indexing information stored in db rather than filesystem?

2011-09-13 Thread Jaeger, Jay - DOT
I don't think you understand. Solr does not have the code to do that. It just isn't there, nor would I expect it would ever be there. Solr is open source though. You could look at the code and figure out how to do it (though why anyone would do that remains beyond my ability to understand).

RE: How to serach on specific file types ?

2011-09-12 Thread Jaeger, Jay - DOT
Some possibilities: 1) Put the file extension into your index (that is what we did when we were testing indexing documents with Solr) 2) Put a mime type for the document into your index. 3) Put the whole file name / URL into your index, and match on part of the name. This will give some false p

RE: Master Slave Question

2011-09-12 Thread Jaeger, Jay - DOT
You could prevent queries to the master by limiting what IP addresses are allowed to communicate with it, or by modifying web.xml to put different security on /update vs. /select . We took a simplistic approach. We did some load testing, and discovered that we could handle our expected update

RE: question about StandardAnalyzer, differences between solr 1.4 and solr 3.3

2011-09-12 Thread Jaeger, Jay - DOT
Looking at the Wiki ( http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters ), it looks like the solr.StandardTokenizerFactory changed with Solr 3.1 . We use solr.KeyWordTokenizerFactory for our middle names (and then also throw in solr.LowerCaseFilterFactory to normalize to lower case).

RE: can indexing information stored in db rather than filesystem?

2011-09-08 Thread Jaeger, Jay - DOT
If you think about it, Lucene (upon which Solr is build) *is* a kind of DBMS - just not an RDBMS. After all, in the end, a DBMS stores its stuff in files, too. If you then turned around and mapped the stuff that Solr does into database tables, you would lose all of the performance advantages t

RE: Spellcheck

2011-09-08 Thread Jaeger, Jay - DOT
> " Following up from your message on the Nutch list. If q=*:* is showing you > empty elements, no fields are getting indexed." I don't think that is correct. I believe that the correct statement would be no fields are getting *** stored ***. If the fields were not getting indexed, they woul

RE: running SOLR on same server as your website

2011-09-07 Thread Jaeger, Jay - DOT
You could host Solr inside the same Tomcat container, or in a different servlet container (say, a second Tomcat instance) on the same server. Be aware of your OS memory requirements, though: In my experience, Solr performs best when it has lots of OS memory to cache index files (at least, if y

RE: how to run solr in apache server?

2011-09-07 Thread Jaeger, Jay - DOT
solr in apache server? Thank u for ur reply Jaeger, Jay - DOT. so i can conclude that solr will run only on "application servers"(having servlet containers) and not in "web" servers am i correct? and i have one more question is it possible to add serv

RE: how to run solr in apache server?

2011-09-07 Thread Jaeger, Jay - DOT
Other containers that will support Solr: just about any JEE/J2EE container. We have tested under WebSphere Application Server Version 7 -- works fine. Oracle's web application server would presumably work, too -- just about anything. -Original Message- From: nagarjuna [mailto:nagarju

RE: how to run solr in apache server?

2011-09-07 Thread Jaeger, Jay - DOT
That is correct. Apache is not an *application* server. It is an HTTP *web* server. On its own it does not support running Java applications written to the JEE/J2EE servlet specification - like Solr. (Apache is also not written in Java, if that was what you meant). -Original Message-

RE: Synonyms Not Working when using SRC & DEST

2011-09-07 Thread Jaeger, Jay - DOT
Also, just to make one thing just a bit more clear. You can specify two different kinds of entries in synonym files. See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters (solr.SynonymFilterFactory) One is replacement, where the words before the "=>" are *replaced* by the right h

RE: Synonyms Not Working when using SRC & DEST

2011-09-07 Thread Jaeger, Jay - DOT
> I have a very huge schema spanning up to 10K lines , if I use query time it > will be huge hit for me because one term will be mapped to multiple terms . > similar in the case of allergy I think maybe you mean synonym file, rather than the schema? I doubt that the number of lines matters all t

RE: Synonyms Not Working when using SRC & DEST

2011-09-06 Thread Jaeger, Jay - DOT
It won't work given your current schema. To get the desired results, you would need to expand your synonyms at both index AND query time. Right now your schema seems to specify it only at index time. So, as the other respondent indicated, currently you replace allergy with the other list when

RE: how to write a script for indexing in windows to perform scheduling?

2011-09-06 Thread Jaeger, Jay - DOT
You seem to have two questions: 1) How to write a script to import data 2) How to schedule that in Windows For #1, I suggest that you visit the Solr tutorials at http://lucene.apache.org/solr/tutorial.html to learn what commands might be used to import data. You might find that you need to a

RE: copying one field to another using regex

2011-09-06 Thread Jaeger, Jay - DOT
Not quite sure what you are asking. You can certainly use copyField to copy a field, and then apply regex on the destination field's fieldType. We do that. JRJ -Original Message- From: alx...@aim.com [mailto:alx...@aim.com] Sent: Thursday, September 01, 2011 4:16 PM To: solr-user@luce

RE: Wildcard Query

2011-09-06 Thread Jaeger, Jay - DOT
I solved a similar kind of issue (where I actually needed multi-valued attributes, e.g. people with multiple or hyphenated last names) by including PositionFilterFactory in the filter list for the analyzer in such fields' fieldType, thereby setting the position of each value to 1. JRJ -Ori

RE: is it possible to do automatic indexing in solr ?

2011-09-01 Thread Jaeger, Jay - DOT
If you are indexing data, rather than "documents", another possibility is to use database triggers to fire off updates. -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Wednesday, August 31, 2011 9:13 AM To: solr-user@lucene.apache.org Subject: Re: is it possib

RE: core creation and instanceDir parameter

2011-08-31 Thread Jaeger, Jay - DOT
Well, if it is for creating a *new* core, Solr doesn't know it is pointing to your shared conf directory until after you create it, does it? JRJ -Original Message- From: Gérard Dupont [mailto:ger.dup...@gmail.com] Sent: Wednesday, August 31, 2011 8:17 AM To: solr-user@lucene.apache.org

RE: missing field in schema browser on solr admin

2011-08-30 Thread Jaeger, Jay - DOT
Also... Did he restart either his web app server container or at least the Solr servlet inside the container? JRJ -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Friday, August 26, 2011 5:29 AM To: solr-user@lucene.apache.org Subject: Re: missing field in sc

RE: add documents to the slave

2011-08-30 Thread Jaeger, Jay - DOT
Another way that occurs to me is that if you have a on the update URL(s) in your web.xml, you can map them to no groups / empty groups in the JEE container. JRJ -Original Message- From: simon [mailto:mtnes...@gmail.com] Sent: Tuesday, August 30, 2011 12:21 PM To: solr-user@lucene.apac

RE: Viewing the complete document from within the index

2011-08-30 Thread Jaeger, Jay - DOT
> I am trying > to peek into the index to see if my index-time synonym expansions are > working properly or not. For this I have successfully used the analysis page of the admin application that comes out of the box. Works really well for debugging schema changes. JRJ -Original Message

RE: how to deal with URLDatasource which needs authorization?

2011-08-29 Thread Jaeger, Jay - DOT
So, the question then seems to be: is there a way to place credentials in the URLDataSource. There doesn't seem to be an explicit user ID or password ( http://wiki.apache.org/solr/DataImportHandler#Configuration_of_URLDataSource_or_HttpDataSource ) but perhaps you can include them in URL fashi

RE: SolrServer instances

2011-08-29 Thread Jaeger, Jay - DOT
It sounds like the correspondent (Jonty) is thinking just in terms of SolrJ -- wanting to share that across multiple threads in an application server. In which case the question would be whether it would be possible/safe/efficient to share a single instantiation of the SolrJ class(es) across mul

RE: Solr in a windows shared hosting environment

2011-08-25 Thread Jaeger, Jay - DOT
ndows shared hosting environment Thank you! Since it's shared hosting, how do I install java? -Original Message----- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Thursday, August 25, 2011 4:34 PM To: solr-user@lucene.apache.org Subject: RE: Solr in a windows shared hosting e

RE: How to copy and extract information from a multi-line text before the tokenizer

2011-08-25 Thread Jaeger, Jay - DOT
"A programmer had a problem. He tried to solve it with regular expressions. Now he has two problems" :). A. That just isn't fair... 8^) (I can't think of very many things that have allowed me to perform more magic over my career than regular expressions, starting with SNOBOL. Uh oh: I ju

RE: Solr in a windows shared hosting environment

2011-08-25 Thread Jaeger, Jay - DOT
Yes, but since Solr is written in Java to run in a JEE container, you would host Solr in a web application server, either Jetty (which comes packaged), or something else (say, Tomcat or WebSphere or something like that). As a result, you aren't going to find anything that says how to run Solr un

RE: Best way to anchor solr searches?

2011-08-25 Thread Jaeger, Jay - DOT
I don't think it has to be quite so bleak as that, depending upon the number of queries done over a given timeframe, and the size of the result sets. Solr does cache the identifiers of "documents" returned by search results. See http://wiki.apache.org/solr/SolrCaching paying particular attent

RE: query

2011-08-24 Thread Jaeger, Jay - DOT
One way I had thought of doing this kind of thing: include in the index an "ACL" of some sort. The problem I see in your case is that the list if "friends" can presumably change over time. So, given that, one way would be to have a little application in between. The request goes to the appli

RE: how to deal with URLDatasource which needs authorization?

2011-08-24 Thread Jaeger, Jay - DOT
You could run the HTML import from Tika (see the Solr tutorial on the Solr website). The job that ran Tika would need the user/password of the site to be indexed, but Solr would not. (You might have to write a little script to get the HTML page using curl or wget or Nutch). Users could then s

RE: XSLT Exception

2011-08-18 Thread Jaeger, Jay - DOT
I am not an XSLT expert, but believe that in XSLT, "not" is a function, rather than an operator. http://www.w3.org/TR/xpath-functions/#func-not So, not(contains)) rather than not contains() should presumably do the trick. -Original Message- From: Christopher Gross [mailto:cog

RE: Solr Copyfields

2011-08-18 Thread Jaeger, Jay - DOT
I would suggest #3, unless you have some very unusual performance requirements. It has the advantage of isolating your index environment requirements from the database. -Original Message- From: Nicholas Fellows [mailto:n...@djdownload.com] Sent: Thursday, August 18, 2011 8:40 AM To:

RE: Synonym and Whitespaces and optional TokenizerFactory

2011-08-18 Thread Jaeger, Jay - DOT
You could presumably do it with solr.PatternTokenizerFactory with the pattern set to .* as your Or, maybe, if Solr allows it, you don't use any tokenizer at all? Or, maybe you could use solr.WhitespaceTokenizerFactory, allowing it to split up the words, along with solr.WordDelimiterFilterFacto

RE: 'Stable' 4.0 version

2011-08-17 Thread Jaeger, Jay - DOT
> geospatial requirements Looking at your email address, no surprise there. 8^) > What insight can you share (if any) regarding moving forward to a later > nightly build? I used build 1271 (Solr 1.4.1, which seemed to be called Solr 4 at the time) during some testing, and it performed well

RE: Most current tik jar files that work with Solr 1.4.1

2011-08-17 Thread Jaeger, Jay - DOT
> What is the latest version of Tika that I can use with Solr 1.4.1? it > comes packaged with 0.4. I tried 0.8 and it no workie. When I was testing Tika last year, I used Solr build 1271 to get the most recent Tika I could get my hands on at the time. That was before Solr 3.1, so I expect it

RE: Solr 1.4.1 vs 3.3 (Speed)

2011-08-17 Thread Jaeger, Jay - DOT
It would perhaps help if you reported what you mean by "noticeably less time". What were your timings? Did you run the tests multiple times? One thing to watch for in testing: Solr performance is greatly affected by the OS file system cache. So make sure when testing that you use the same

RE: master unreachable - attempting simple replication

2011-08-17 Thread Jaeger, Jay - DOT
I'd suggest looking at the logs of the master to see if the request is getting thru or not, or if there are any errors logged there. If the master has a replication config error, it might show up there. We just went thru some master/slave troubleshooting. Here are some things that you might l

RE: Unable to get multicore working

2011-08-17 Thread Jaeger, Jay - DOT
okay, now. Thanks for the help. You guys saved me from the insane asylum. On Tuesday, 16 August, 2011 at 2:32 PM, Jaeger, Jay - DOT wrote: > That said, the logs are showing a different error now. Excellent! The site > schemas are loading! > > Great! > > "SEVERE: org.apa

RE: Unable to get multicore working

2011-08-16 Thread Jaeger, Jay - DOT
now. Excellent! The site schemas are loading! Looks like the site schemas have an issue: "SEVERE: org.apache.solr.common.SolrException: Unknown fieldtype 'long' specified on field area_id" Errr. Why would `long` be an invalid type? On Tuesday, 16 August, 2011 at 2:06 PM, Jaeg

RE: Unable to get multicore working

2011-08-16 Thread Jaeger, Jay - DOT
Whoops: That was Solr 4.0 (which pre-dates 3.1). I doubt very much that the release matters, though: I expect the behavior would be the same. -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Tuesday, August 16, 2011 4:04 PM To: solr-user

RE: Unable to get multicore working

2011-08-16 Thread Jaeger, Jay - DOT
I tried on my own test environment -- pulling out the default core parameter out, under Solr 3.1 I got exactly your symptom: an error 404. HTTP ERROR 404 Problem accessing /solr/admin/index.jsp. Reason: missing core name in path The log showed: 2011-08-

RE: Unable to get multicore working

2011-08-16 Thread Jaeger, Jay - DOT
them, besides 404 errors. On Tuesday, 16 August, 2011 at 1:10 PM, Jaeger, Jay - DOT wrote: > Perhaps your admin doesn’t work because you don't have > defaultCoreName="whatever-core-you-want-by-default" in your tag? E.g.: > > > > Perhaps this was enough

  1   2   >