going to
help a user find nurse. I think part of this is that some people feel that
databases like MSSQL, MYSQL should be able to provide quality search
experience, but they just flat out don't. It's a separate utility.
Thanks Walter.
On 9/22/06, Walter Underwood [EMAIL PROTECTED] wrote
is not legal UTF-8.
Does Solr report parsing errors? It really should. Maybe a 400 Bad Request
response with a text/plain body showing the error message.
wunder
On 9/22/06 6:24 PM, James liu [EMAIL PROTECTED] wrote:
2006/9/23, Walter Underwood [EMAIL PROTECTED]:
On 9/21/06 5:37 PM, James liu [EMAIL
On 9/27/06 9:07 AM, Simon Willnauer [EMAIL PROTECTED]
wrote:
First I agree with yonik, the main point is to define which classes /
parts / mbeans should be exposed to JMX is the hard part and should be
planned carefully.
That is the hard part regardless of whether we use JMX or bare-metal
What is a good size for batching updates? My xml update docs are
around 600-700 bytes each right now.
wunder
--
Walter Underwood
Search Guru, Netflix
On 10/31/06 12:54 PM, Mike Klaas [EMAIL PROTECTED] wrote:
On 10/31/06, Walter Underwood [EMAIL PROTECTED] wrote:
What is a good size for batching updates? My xml update docs are
around 600-700 bytes each right now.
When I think of batches I think of documents sent before a
commit
small corpus (65K docs) I was seeing over 240 qps
on my dev box (dual 3 GHz Xeon). I expect that it didn't touch
the disk at all, since the index is only 50 Meg.
wunder
--
Walter Underwood
Search Guru, Netflix
list, but when I view
the message, no attachment is available. Could you try sending this
attachment again?
Thanks --Joachim
Walter Underwood wrote:
I've done some testing using JMeter. I followed the instructions
in the JMeter FAQ for How do I use external data files in my
test scripts
in
other engines. Otherwise, you go nuts trying to get your analyzer
to handle .NET and vitamin a. I know that AltaVista and Inktomi
did this.
wunder
--
Walter Underwood
Search Guru, Netflix
can't just override a method of QueryParser to do
: this).
we could add this to the function parser, so _val_:ALL could return a
MatchAllDocsQuery ?
I was thinking something similar, maybe _solr:all. At Infoseek, we
hardcoded url:http to match all docs.
wunder
--
Walter Underwood
Search Guru
At some point, it would be simpler to write a custom response handler
and generate the output in your desired XML format.
wunder
On 12/5/06 1:52 PM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
Hi,
the idea is to apply XSLT transformation on the result. But it seems that
I would have to
Analyzer before passing it along.
Why won't cdata work?
Some octet (byte) values are illegal in XML. Most of the ASCII control
characters are not allowed. If one of those is in an XML document,
it is a fatal error and must stop parsing in any conforming XML
parser.
wunder
--
Walter Underwood
On 1/3/07 9:33 PM, Yonik Seeley [EMAIL PROTECTED] wrote:
On 1/3/07, Walter Underwood [EMAIL PROTECTED] wrote:
We tried several APIs and decided that the best was an array of
String with the odd elements containing the strings that needed
highlighting.
Good idea... the only thing I could
Ultraseek and the Googlebox are about your only
choice.
wunder
--
Walter Underwood
Search Guru, Netflix
Former Architect for Ultraseek
On 1/19/07 10:02 AM, Brian Lucas [EMAIL PROTECTED] wrote:
Walter Underwood wrote:
Use GET unless it really, really, really doesn't work. POST is
the wrong HTTP semantic for fetching information. Long query
strings are not a good enough reason. HTTP puts no limit on the
length of a URL
it, an AND default is a very bad
idea for nearly all sites.
wunder
--
Walter Underwood
Search Guru, Netflix
On 1/27/07 1:12 PM, Tracey Jaquith [EMAIL PROTECTED] wrote:
* To be fair, Michael StAck (our greatest help for prior SE life support)
has smartly pointed out that by making a smarter schema and strategy,
I could reduce the number of fields searched from 677 to 5, with the
same overall
We would never use JOIN. We denormalize for speed. Not a big deal.
wunder
==
Search Guru, Netflix
On 2/3/07 11:16 AM, Brian Whitman [EMAIL PROTECTED] wrote:
On Feb 2, 2007, at 4:46 PM, Ryan McKinley wrote:
I would LOVE to see a JOIN in SOLR.
I have an index of artists, albums, and
You can declare the top result to be 100% and scale from there.
Percent relevant is not a concept that really holds together.
What does it mean to be 100% relevant? I'm not even sure what
twice as relevant means.
A tf.idf engine, like Lucene, might not have a maximum score.
What if a document
Lucene/Solr does this automatically. That is how a tf.idf
engine works, it boosts rare words.
Do you have examples of problems or are you worrying about
something that might happen?
wunder
On 2/19/07 1:22 AM, rubdabadub [EMAIL PROTECTED] wrote:
Hi:
I was wondering how are you guys dealing
Indexing rates depend heavily on document size (text) and pre-indexing
processing. Other things probably matter, too, like number of fields.
My application is indexing 20X faster than Christian's, because I have
small documents (a few hundred bytes) that are extracted from an RDBMS
and submitted
Try running your submits while watching a CPU load meter.
Do this on a multi-CPU machine.
If all CPUs are busy, you are running as fast as possible.
If one CPU is busy (around 50% usage on a dual-CPU system),
parallel submits might help.
If no CPU is 100% busy, the bottleneck is probably disk
On 2/22/07 1:37 PM, Jack L [EMAIL PROTECTED] wrote:
I wonder what happens if I change the schema after some documents
have been inserted? Is this allowed at all? Will the index become
corrupted if I add/remove some fields? Or change the field properties?
The schema just controls the input
It is a bug, though. That should send an error message, not a
stack trace. --wunder
On 2/23/07 10:39 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:
Oh, look at that, adding field name=id1/field took care of the bombing,
nice!
Thanks,
Otis
I tried posting that, like this:
$ java -jar
I was bit by this, tool. It made getting started a lot harder.
I think I had something outside of an lst instead of inside.
More recently, I got a query time exception from a mis-formatted
mm field.
Right now, Solr accesses the DOM as needed (at runtime) to fetch
information. There isn't much
On 3/3/07 1:43 PM, Chris Hostetter [EMAIL PROTECTED] wrote:
: Right now, Solr accesses the DOM as needed (at runtime) to fetch
: information. There isn't much up-front checking beyond the XML
: parser.
bingo, and adding more upfront checking is hard for at least two reasons i
can think
On 3/4/07 3:01 PM, Chris Hostetter [EMAIL PROTECTED] wrote:
I'm actaully haven't a hard time thinking of what kinds of just in time
DOM walking is delayed until request ... all of the feld names are already
known, the analyzers are built, the requesthandlers and responsewriters
all exist and
Is anyone running Solr on Tomcat 6.0.10? Any issues?
I searched the archives and didn't see anything.
wunder
--
Walter Underwood
Search Guru, Netflix
Java 1.5.0_05 on Intel and PowerPC (IBM) plus any DST changes. --wunder
On 3/8/07 4:08 AM, James liu [EMAIL PROTECTED] wrote:
today i use tomcat 6.0.10,,,but no time to search.
tomorrow i will test it.
which java version you use?
2007/3/8, Walter Underwood [EMAIL PROTECTED
It is better to use application/xml. See RFC 3023.
Using text/xml; charset=UTF-8 will override the XML
encoding declaration. application/xml will not.
wunder
On 3/10/07 12:39 PM, Bertrand Delacretaz [EMAIL PROTECTED] wrote:
On 3/10/07, Morten Fangel [EMAIL PROTECTED] wrote:
...I send a
If it does something different, that is a bug. RFC 3023 is clear. --wunder
On 3/10/07 1:49 PM, Bertrand Delacretaz [EMAIL PROTECTED] wrote:
On 3/10/07, Walter Underwood [EMAIL PROTECTED] wrote:
It is better to use application/xml. See RFC 3023.
Using text/xml; charset=UTF-8 will override
What are you trying to achieve? Let's start with the problem
instead of picking one solution which Solr doesn't support. --wunder
On 3/10/07 5:08 PM, shai deljo [EMAIL PROTECTED] wrote:
How can i boost some tokens over others in the same field (at Index
time) ? If this is not supported
that have different importance.
I thought boosting would be an elegant way to take this into account.
Please advise,
On 3/10/07, Walter Underwood [EMAIL PROTECTED] wrote:
What are you trying to achieve? Let's start with the problem
instead of picking one solution which Solr doesn't support
That works if you keep track of all documents that have disappeared
since the last index run. Otherwise, you end up with orphans in
the search index, documents that exist in search, but not in the
real world, also known as serving 404's in results.
wunder
--
Walter Underwood
Search Guru, Netflix
You could also promote recent results with a function query term.
I've done that for news sites, where recency is an important
part of relevancy. --wunder
On 3/23/07 4:59 PM, Chris Hostetter [EMAIL PROTECTED] wrote:
: Is there a way (in 1 query) to retrieve the best scoring X results and
:
I don't recommend defaulting to AND. This will increase the number
of failed searches (no hits) for your users. If one word is misspelled
in a multi-word AND query, you'll get no results. Since About 10% of
queries are misspelled and about half of queries are multi-word, that
will immediately
On 3/27/07 10:57 AM, Mike Klaas [EMAIL PROTECTED] wrote:
I agree with your point above, but I fear AND: bad! OR: good!
becoming dogma--often AND+spellcheck is the better option.
AND-with-spell-suggestion is better, but the spelling suggestion
needs to be really, really good. That is really
This does seem to be a Tomcat config problem. Start with this search
to find other e-mail strings on this:
http://www.google.com/search?q=SEVERE%3A+Error+filterStart
wunder
On 4/5/07 11:43 AM, Chris Hostetter [EMAIL PROTECTED] wrote:
: SEVERE: Error filterStart
: Apr 5, 2007 10:11:28 AM
A --wunder
On 4/6/07 10:51 AM, Yonik Seeley [EMAIL PROTECTED] wrote:
Quick poll... Solr 2.1 release planning is underway, and a new logo
may be a part of that.
What form of logo do you prefer, A or B? There may be further
tweaks to these pictures, but I'd like to get a sense of what the
Here is a late response, apache.org was rejecting our e-mails...
Allowing leading wildcards opens up a denial of service attack. It becomes
trivial to overload the search engine and take it out of service, just
hammer it with leading wildcard queries. Please leave the default as
disabled. If we
UTF-16 support should not require any changes to the XML parsing.
All XML parsers are required to support that encoding. The real
change is implementing RFC 3023 (XML Media Types) so that the
encoding can be specified over HTTP.
wunder
On 4/23/07 11:13 AM, Mike Klaas [EMAIL PROTECTED] wrote:
Enable leading wildcards and try this:
type:changelog AND filename:*angel*
wunder
On 4/25/07 1:34 PM, Michael Kimsal [EMAIL PROTECTED] wrote:
Thanks. I'm still no results with your suggestion though. I also tried
type:+changelog AND ( (filename:angel) OR (filename:angel*) OR
A agree that multi-word synonyms are an excellent way to do this.
This may sound like a hack, but you'd end up doing this even if
you had dedicated linguistic compound decomposition software.
Those usually use a dictionary of common words and the dictionary
rarely has all the words that are
I didn't remember that requirement, so I looked it up. It was added
in XML 1.0 2nd edition. Originally, unspecified encodings were open
for auto-detection.
Content type trumps encoding declarations, of course, per RFC 3023
and allowed by the XML spec.
wunder
On 5/9/07 4:19 PM, Mike Klaas [EMAIL
No problem. Use a boost function. In a DisMaxRequestHandler spec
in solrconfig.xml, specify this:
str name=bf
popularity^0.5
/str
This value will be added to the score before ranking.
You will probably need to fuss with the multiplier to get the popularity
to the right proportion of
access log so you can correlate the
entries.
wunder
On 5/9/07 9:43 PM, Ian Holsman [EMAIL PROTECTED] wrote:
Walter Underwood wrote:
This is for monitoring -- what happened in the last 30 seconds.
Log file analysis doesn't really do that.
I would respectfully disagree.
Log file analysis
The boost is a way to adjust the weight of that field, just like you
adjust the weight of any other field. If the boost is dominating the
score, reduce the weight and vice versa.
wunder
On 5/10/07 9:22 PM, Chris Hostetter [EMAIL PROTECTED] wrote:
: Is this correct? bf is a boosting
I solved something similar to this by creating a stemmer for part
numbers. Variations like -BN on the end can be treated as inflections
in the part number language, similar to plurals in English.
I used a set of regexes to match and transform, in some cases generating
multiple root part numbers.
With a multi-valued field, is the length norm based the individual
matched value (string) or on all the tokens in the field? I'm guessing
that it is the latter, and I expect I could find that in the source
or explain if I looked hard enough, but maybe someone already knows.
wunder
--
Walter
On 6/4/07 11:24 AM, Chris Hostetter [EMAIL PROTECTED] wrote:
: With a multi-valued field, is the length norm based the individual
: matched value (string) or on all the tokens in the field? I'm guessing
: that it is the latter, and I expect I could find that in the source
: or explain if I
Solr doesn't have the URL of the document. The document is given
to Solr in an HTTP POST.
Solr is not a web spider, it is a search web service.
wunder
On 6/12/07 6:23 AM, Ard Schrijvers [EMAIL PROTECTED] wrote:
Hello Otis,
thanks for the info. Would it a be an improvement to be able to
Do we have a bug filed on this? Solr really should have complained
about the unknown element. --wunder
On 6/14/07 4:54 PM, Tiong Jeffrey [EMAIL PROTECTED] wrote:
arh! i spent 6-7 hours on this error and didnt see this! thanks!
On 6/15/07, Yonik Seeley [EMAIL PROTECTED] wrote:
On 6/14/07,
I used Solr with indexes on NFS and I do not recommend it.
It was either 100 or 1000 times slower than local disc
for indexing, I forget which. Unusable.
This is not a problem with Solr/Lucene, I have seen the
same NFS performance cost with other search engines.
wunder
On 6/21/07 3:22 AM, Otis
This is proper behavior according to RFC 3023. An encoding in the
XML declaration is ignored unless the content-type is application/xml.
wunder
On 6/25/07 8:27 AM, Yonik Seeley [EMAIL PROTECTED] wrote:
On 6/23/07, Chris Hostetter [EMAIL PROTECTED] wrote:
: Is it possible to use Windows
The Atom Publishing Protocol would be a good choice for a rest API to Solr.
That comes with a spec, interop testing, and an active community.
wunder
On 7/2/07 6:22 PM, Ian Holsman [EMAIL PROTECTED] wrote:
Hi.
I've been playing with Kettle (http://kettle.pentaho.org/ ) as a method
to inject
Solr doesn't have a record of what documents were accessed.
The document cache shows which documents were in the parts
of search result list which were served, but probably not
a count of those inclusions.
Luckily, this information is trivial to get from HTTP
server access logs. Look for
This caused me a certain amount of trouble, because the parser
errors with ill-formed queries. Try these:
foo -
TO HAVE AND HAVE NOT
wunder
On 8/1/07 12:47 AM, Chris Hostetter [EMAIL PROTECTED] wrote:
: StandardRequestHandler), but I also want to be able to use Lucene's
: boolean
You get that behavior by avoiding any extra syntax. Use this query:
a:valueAlpha b:valueBeta c:valueGamma
If one of the terms is very common and one is very rare, it might
not sort on pure existance. This is a tf.idf engine.
wunder
On 8/1/07 11:00 AM, Lance Lance [EMAIL PROTECTED] wrote:
Use the minimum match spec for a flexible version of all-terms
matching.
Before implementing all-terms matching, start logging the number of
searches that result in no matches. All-terms can cause big problems.
One wrong or misspelled word means no matches, and searchers don't
know how to fix
, Daniel Naber [EMAIL PROTECTED] wrote:
On Thursday 02 August 2007 18:46, Walter Underwood wrote:
Use the minimum match spec for a flexible version of all-terms
matching.
I think this is too difficult and unpredictable. I also don't know how I
should justify a setting like 75%, just because
At Infoseek, we ran a separate search index with today's updates
and merged that in once each day. It requires a little bit of
federated search to prefer the new content over the big index,
but the daily index can be very nimble for update.
wunder
On 8/22/07 7:58 AM, mike topper [EMAIL
How is the performace? For me, Solr got about 100 times faster for
update when I moved the files from NFS to local disk.
wunder
On 8/22/07 2:27 PM, Kasi Sankaralingam [EMAIL PROTECTED] wrote:
Instance (index server) for indexing. The index file data directory
reside on a NFS partition, I am
It should work fine to index them and search them. 13 million docs is
not even close to the limits for Lucene and Solr. Have you had problems?
wunder
On 8/23/07 7:30 AM, Jae Joo [EMAIL PROTECTED] wrote:
Is there any solution to handle 13 millions document shown as below?
Each document is not
No need to run a separate web server. I actually do HTTP updates from
an extra servlet configured into the Solr webserver. It might
seem a little odd, but same-system TCP sockets are extremely fast
and low overhead.
The additional flexibility is nice, too. If I find a bug in the
indexing code in
Sorry dude, I'm pining for Python and coding in Java. --wunder
On 8/30/07 6:57 PM, Erik Hatcher [EMAIL PROTECTED] wrote:
On Aug 30, 2007, at 6:31 PM, Mike Klaas wrote:
Another reason why people use stored procs is to prevent multiple
round-trips in a multi-stage query operation. This is
Not really. It is a very poor substitute for reading the release notes,
and sufficiently inadequate that it might not be worth the time.
Diffing the example with the previous release is probably more
instructive, but might or might not help for your application.
A config file checker would be
Legal discovery can have requirements like this. --wunder
On 9/7/07 4:47 AM, Brian Carmalt [EMAIL PROTECTED] wrote:
Lance Norskog schrieb:
Now I'm curious: what is the use case for documents this large?
Thanks,
Lance Norskog
It is a rand use case, but could become relevant for
Even if KStem isn't ASL, we could include the plug-in code
with notes about how to get the stemmer. Or, the Solr plug-in
could be contributed to the group that manages the KStem
distribution:
http://ciir.cs.umass.edu/cgi-bin/downloads/downloads.cgi
wunder
On 9/7/07 12:59 PM, Yonik Seeley
The straightforward solution is to not put your indexes on NFS. It is
slow and it causes failures like this.
I'm serious about that. I've seen several different search engines
(not just Solr/Lucene) get very slow and unreliable when the indexes
were on NFS.
wunder
On 9/13/07 10:59 AM, Kasi
. Same for mit. In English, that is the Massachusetts
Institute of Technology.
wunder
==
Walter Underwood
Search Guy, Netflix
On 9/14/07 2:09 PM, Marc Bechler [EMAIL PROTECTED] wrote:
Hi Tom,
thanks for your professional response -- works fine and looks good :-).
Since I am playing around
You could MD4 the parts you care about, store that, fetch it and compare.
If there is a reliable timestamp, you could use that. But that would be
app-dependent.
In general, you need to store some info about each source document
and figure out whether it is new. This get much hairier with a web
This would probably work, but the approach has a subtle flaw.
If a query has one word that matches a lot of titles, but a
phrase that matches a description, the best result will be shown
far too low, after all the titles.
A better approach is to weight the titles a bit higher than the
That seems well within Solr's capabilities, though you should come up
with a desired queries/sec figure.
Solr's query rate varies widely with the configuration -- how many
fields, fuzzy search, highlighting, facets, etc.
Essentially, Solr uses Lucene, a modern search core. It has performance
and
No one can answer that, because it depends on how you configure Solr.
How many fields do you want to search? Are you using fuzzy search?
Facets? Highlighting?
We are searching a much smaller collection, about 250K docs, with
great success. We see 80 queries/sec on each of four servers, and
Accent transforms are language-specific, so an accent filter
should take an ISO langauge code as an argument.
Some examples:
* In French and English, a diereses is a hint to pronounce neighboring
vowels separateley, as in coöp, naïve, or Noël.
* In German, ü transformes to ue.
* In Swedish, ö
I do not think it will be much faster. The data transfer time is small
compared to the indexing time.
The indexing will probably take less than a day, so if you spend more
than 30 minutes coding a faster method, the project will take longer.
wunder
On 9/28/07 6:06 AM, Jae Joo [EMAIL PROTECTED]
to upgrade.
Thanks everyone, this is a great piece of software.
wunder
--
Walter Underwood
Search Guy, Netflix
I think Chris Harris is doing that. I'll check it and touch it up
afterwards. Avoid race conditions. --wunder
On 10/2/07 4:26 PM, Chris Hostetter [EMAIL PROTECTED] wrote:
: Here at Netflix, we switched over our site search to Solr two weeks ago.
That's great Walter ... could I persuade
, Walter Underwood [EMAIL PROTECTED] wrote:
Here at Netflix, we switched over our site search to Solr two weeks ago.
We've seen zero problems with the server. We average 1.2 million
queries/day on a 250K item index. We're running four Solr servers
with simple round-robin HTTP load-sharing
We don't use Solr replication. Each server is independent and
does its own indexing. This has several advantages:
* all installations are identical
* no single point of failure
* no inter-server version or config dependencies
* we can run a different version or config on one server for testing
Wow, well-formed HTML. That's a rare beast. --wunder
On 10/4/07 7:08 PM, Chris Hostetter [EMAIL PROTECTED] wrote:
if you have wellformed HTML documents, use an HTML parser to extract the
real content.
Solr is not an XML engine (or a MARC engine). It uses XML as an input format
for fielded data. It does not index or search arbitrary XML. You need to
convert your XML into Solr's format.
I would recommend expressing MARC in a Solr schema, then working on the
input XML. The input XML depends on
That is one seriously manly regex, but I'd recommend using the Tag Soup
parser instead:
http://ccil.org/~cowan/XML/tagsoup/
wunder
On 10/4/07 10:11 PM, J.J. Larrea [EMAIL PROTECTED] wrote:
It uses a PatternTokenizerFactory with a RegEx that swallows runs of HTML- or
XML-like tags:
We run multiple, identical, independent copies. No master/slave
dependencies. Yes, we run indexing N times for N servers, but
that's what CPU is for and I sleep better at night. It makes
testing and deployment trivial, too.
wunder
==
Walter Underwood
Search Guy, Netflix
On 10/8/07 4:05 AM
This even works if you request 0 results. --wunder
On 10/11/07 1:56 AM, Stefan Rinner [EMAIL PROTECTED] wrote:
On Oct 10, 2007, at 6:49 PM, Chris Hostetter wrote:
: I think search for *:* is the optimal code to do it. I don't
think you can
: do anything faster.
FYI: getting the data
There is a request handler in 1.2 for Atom. That might be close.
OpenSearch was a pretty poor design and is dead now, so I wouldn't
expect any new implementations. Google's GData (based on Atom)
reuses the few useful OpenSearch elements needed for things
like number of hits. Solr's Atom support
Also die in German and English. --wunder
On 10/18/07 4:16 AM, Andrzej Bialecki [EMAIL PROTECTED] wrote:
One example that I'm familiar with: words is and by in English and
in Swedish. Both words are stopwords in English, but they are content
words in Swedish (ice and village, respectively).
The question almost doesn't make sense, because SANs are so configurable.
It is like saying over a network without specifying whether the network
is dial-up or fiber.
A few things to note:
* The automatic backups are not synchronized with consistent index states,
so they are probably useless.
*
We've had some performance problems while Solr is indexing and also when it
starts with a cold cache. I'm still digging through our own logs, but I'd
like to get more info about this, so any ideas or info are welcome.
We have four Solr servers on dual CPU PowerPC machines, 2G of heap, about
Solr 1.1. --wunder
On 10/22/07 10:06 AM, Walter Underwood [EMAIL PROTECTED] wrote:
We've had some performance problems while Solr is indexing and also when it
starts with a cold cache. I'm still digging through our own logs, but I'd
like to get more info about this, so any ideas or info
We do an optimize after indexing, so the number of segments
isn't an issue. We have the default autowarming settings.
wunder
On 10/22/07 11:00 AM, Yonik Seeley [EMAIL PROTECTED] wrote:
On 10/22/07, Walter Underwood [EMAIL PROTECTED] wrote:
lst name=appends
str name=fq(pushstatus:A
On 10/25/07 12:11 AM, Chris Hostetter [EMAIL PROTECTED] wrote:
this type of question typically falls into two use cases:
1) targeted ads
2) sponsored results
3) Best bets (editorial results)
The query house should return House, M.D. as the first hit,
but that is rather hard to achieve
hurricane katrina is a very expensive query against a collection
focused on Hurricane Katrina. There will be many matches in many
documents. If you want to measure worst-case, this is fine.
I'd try other things, like:
* ninth ward
* Ray Nagin
* Audubon Park
* Canal Street
* French Quarter
* FEMA
/solr/admin/stats.jps is XML with a stylesheet. It contains stuff
like this:
stat name=numDocs
266687
/stat
wunder
On 11/1/07 7:39 PM, Papalagi Pakeha [EMAIL PROTECTED] wrote:
Hello,
Is there any way to get XML version of statistics like how many
documents are
He means extremely frequent and I agree. --wunder
On 11/2/07 1:51 AM, Haishan Chen [EMAIL PROTECTED] wrote:
Thanks for the advice. You certainly have a point. I believe you mean a query
term that appears in 5-10% of an index in a natural language corpus is
extremely INFREQUENT?
This is fairly straightforward and works well with the DisMax
handler. Indes the text into three different fields with three
different sets of analyzers. Use something like this in the
request handler:
requestHandler name=multimatch class=solr.DisMaxRequestHandler
lst name=defaults
Solr queries can't do updates, so passing on raw user queries is OK.
Solr errors for bad query syntax are not pretty, so you will want to
catch those and print a real error message.
wunder
On 11/6/07 8:52 AM, Micah Wedemeyer [EMAIL PROTECTED] wrote:
Are there any security risks to passing a
If you really, really need to preserve the XML structure, you'll
be doing a LOT of work to make Solr do that. It might be cheaper
to start with software that already does that. I recommend
MarkLogic -- I know the principals there, and it is some seriously
fine software. Not free or open, but very,
Some OSs split that 4GB into a 2GB data space and a 2GB instruction
space. To get a 64bit address space, the CPU, OS, and JVM all need
to support 64 bits. There have been 64 bit Xeon chips since 2004,
the Linux 2.6 kernel supports 64 bit, and recent JVMs do, too.
If your Xeon supports 64 bits, you
I had a similar problem with three sources of keys that have collisions
between the values. I prefix a single letter for each source.
movies: M12345
people: P12345
and so on.
wunder
On 11/13/07 12:37 PM, Will Johnson [EMAIL PROTECTED] wrote:
key = sometimesUniqueField + _ +
I'm not an rsync expert, but I beleive that /solr/ is a
virtual directory defined in the rsyncd config. It is mapped
to the real directory.
wunder
On 11/14/07 8:43 AM, Jae Joo [EMAIL PROTECTED] wrote:
In the snappuller, the solr is hardcoded. Should it be
${master_data_dir}?
# rsync over
1 - 100 of 1642 matches
Mail list logo