Hi Furkan,
in order to change the BM25 parameter values k1 and b, the following XML
snippet needs to be added in your schema.xml configuration file:
1.3
0.7
It is even possible to specify the SimilarityFactory on individual index
fields. See [1] for more details.
Best
Sascha
[1]
of fieldLength does not match 8.
Is there same "magic“ applied to the value of field length that goes beyond the
standard BM25 score formula?
If so, what is the idea behind this modification. If not, is this a Lucene /
Solr bug?
Best regards,
Sascha
--
Sascha Szott :: KOBV/ZIB :: +49 30 84185-457
Hi folks,
my Solr index consists of one document with a single valued field "title" of
type "text_general". The title field was index with the content: 1 2 3 4 5 6 7
8 9. The field type text_general uses a StandardTokenizer which should result
in 9 tokens. The corresponding length of field
Hi Ming,
which Solr version are you using? In case you use one of the latest
versions (4.5 or above) try the new parameter facet.threads with a
reasonable value (4 to 8 gave me a massive performance speedup when
working with large facets, i.e. nTerms 10^7).
-Sascha
Mingfeng Yang wrote:
I
Hi folks,
is it possible to use the raw query parser with a disjunctive filter
query? Say, I have a field 'foo' and two values 'v1' and 'v2' (the field
values are free text and can contain any character). What I want is to
retrieve all documents satisying fq=foo:(v1 OR v2). In case only one
Hi Mark,
Mark Miller wrote:
Still waiting on that issue. I think Andrzej should just update it to
trunk and commit - it's option and defaults to off. Go vote :)
Sounds like the problem is already solved and the remaining work
consists of code integration? Can somebody estimate how much work
Hi folks,
a known limitation of the old distributed search feature is the lack of
distributed/global IDFs (#SOLR-1632). Does SolrCloud bring some improvements in
this direction?
Best regards,
Sascha
Hi,
wildcard and fuzzy queries are not analyzed.
-Sascha
Alok Bhandari alokomprakashbhand...@gmail.com schrieb:
Hello ,
I am pushing Chuck Follett'.?.? in solr and when I query for this field
with query string field:Follett'.* I am getting 0 results.
field type declared is
fieldType
Hi,
I suppose you are using Solr 3.6. Then take a look at
http://www.lucidimagination.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
-Sascha
Alok Bhandari alokomprakashbhand...@gmail.com schrieb:
Thanks for reply.
If I check the debug query through
Hi,
perhaps it's better to use a PHP Solr client library. I used
https://code.google.com/p/solr-php-client/
in a project of mine and it worked just fine.
-Sascha
Asif wrote:
I am indexing the file using php curl library. I am stuck here with the code
echo Stored in: . upload/ .
Hi,
did you include the fl parameter in the Solr query URL? If that's the case make
sure that the field name 'text' is mentioned there. You should also make sure
that the field definition (in schema.xml) for 'text' says stored=true,
otherwise the field will not be returned.
-Sascha
Hi,
Solritas uses the dismax query parser. The dismax config parameter 'qf'
specifies the index fields to be searched in. Make sure that 'name' is your
default search field.
-Sascha
Giovanni Gherdovich g.gherdov...@gmail.com schrieb:
Hi all,
this morning I was very proud of myself since
Hi,
as far as I know Solr does not provide such a feature. If you cannot make any
assumptions on the numbers, choose an appropriate library that is able to
transform between numerical and non-numerical representations and populate the
search field with both versions at index-time.
-Sascha
Hi Mari,
it depends ...
* How many records are stored in your MySQL databases?
* How often will updates occur?
* How many db records / index documents are changed per update?
I would suggest to start with a single Solr core first. Thereby, you can
concentrate on the basics and do not need to
Hi,
depending on your needs, take a look at Apache ManifoldCF. It adds
document-level security on top of Solr.
-Sascha
On 23.03.2011 14:20, satya swaroop wrote:
Hi All,
As for my project Requirement i need to keep privacy for search of
files so that i need to modify the code of
Hi Paul,
did you increase the value of the maxFieldLength parameter in your
solrconfig.xml?
-Sascha
On 23.03.2011 17:05, Paul wrote:
I'm using solr 1.4.1.
I have a document that has a pretty big field. If I search for a
phrase that occurs near the start of that field, it works fine. If I
On 23.03.2011 18:52, Paul wrote:
I increased maxFieldLength and reindexed a small number of documents.
That worked -- I got the correct results. In 3 minutes!
Did you mark the field in question as stored = false?
-Sascha
I assume that if I reindex all my documents that all searches will
Hi,
have a look at Solr's ExtractingRequestHandler:
http://wiki.apache.org/solr/ExtractingRequestHandler
-Sascha
On 02.02.2011 16:49, Thumuluri, Sai wrote:
Good Morning,
I am planning to get started on indexing MS office using ApacheSolr -
can someone please direct me where I should
Hi folks,
I've made the same observation when working with Solr's
ExtractingRequestHandler on the command line (no browser interaction).
When issuing the following curl command
curl
'http://mysolrhost/solr/update/extract?extractOnly=trueextractFormat=textwt=xmlresource.name=foo.pdf'
perfectly with the same returned
data in some AJAX environment.
On Tuesday 01 February 2011 18:29:06 Sascha Szott wrote:
Hi folks,
I've made the same observation when working with Solr's
ExtractingRequestHandler on the command line (no browser interaction).
When issuing the following curl command
Hi folks,
I've noticed an unexpected behavior while working with the various
built-in integer field types (int, tint, pint). It seems as the first
two ones are subject to type checking, while the latter one is not.
I'll give you an example based on the example schema that is shipped out
)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
[...]
Is this a bug or did I missed something?
-Sascha
--
Sascha Szott :: KOBV
Hi Don,
you could give the HTTP method to be used as a second argument to the
QueryRequest constructor:
Hi folks,
why does FileListEntityProcessor ignores onError=continue and abort
indexing if a directory or a file does not exist?
I'm using both XPathEntityProcessor and FileListEntityProcessor with
onError set to continue. In case a directory or file is not present an
Exception is thrown and
(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
-Sascha
On 11.08.2010 15:18, Sascha Szott wrote:
Hi folks,
why does FileListEntityProcessor ignores
Hi,
Chris Hostetter wrote:
AND, OR, and NOT are just syntactic-sugar for modifying
the MUST, MUST_NOT, and SHOULD. The default op of OR only affects the
first clause of your query (R) because it doesn't have any modifiers --
Thanks for pointing that out!
-Sascha
the second clause has that
Hi Erick,
thanks for your explanations. But why are all docs being *removed* from
the set of all docs that contain R in their topic field? This would
correspond to a boolean AND and would stand in conflict with the clause
q.op=OR. This seems a bit strange to me.
Furthermore, Smiley Pugh
Hi,
you can delete all docs that match a certain query:
deletequeryuid:6-HOST*/query/delete
-Sascha
bbarani wrote:
Hi,
I am trying to delete a group of documents using wildcard. Something like
Hi,
does /select?q=uid:6-HOST* return any documents?
-Sascha
bbarani wrote:
Hi,
Thanks a lot for your reply..
I tried the below query
update?commit=true%20-H%20Content-Type:%20text/xml%20--data-binary%20'deletequeryuid:6-HOST*/query/delete'
But even now none of the documents are getting
Hi,
take a look inside Solr's log file. Are there any error messages with
respect to the update request?
Furthermore, you could try the following two commands instead:
curl http://host:port/solr/update; --form-string
stream.body=deletequeryuid:6-HOST*/query/delete
curl
Hi folks,
I have a (multi-valued) field topic in my index which does not need to
exist in every document. Now, I'm struggling with formulating a query
that returns all documents that either have no topic field at all *or*
whose topic field value is R.
Unfortunately, the query
Hi Darren,
try mlt.fl=field1 field2
Best,
Sascha
Darren Govoni wrote:
Hi,
I read the wiki and tried about a dozen variations such as:
...mlt.fl=field1mlt.fl=field2
and
...mlt.fl=field1,field2...
to specify more than one MLT field and it won't take. What's the trick?
Also, how to do it
Hi Joe Markus,
sounds good! Maybe I should better add a note on the Wiki page on
federated search [1].
Thanks,
Sascha
[1] http://wiki.apache.org/solr/FederatedSearch
Joe Calderon wrote:
yes, you can use distributed search across shards with different
schemas as long as the query only
Hi folks,
if I'm seeing it right Solr currently does not provide any support for
federated / meta searching. Therefore, I'd like to know if anyone has
already put efforts into this direction? Moreover, is federated / meta
search considered a scenario Solr should be able to deal with at all or
followed by (auskunft or profiauskunft) you mentioned will
occur.
Best,
Sascha
-Ursprüngliche Nachricht-
Von: Sascha Szott [mailto:sz...@zib.de]
Gesendet: Sonntag, 30. Mai 2010 19:01
An: solr-user@lucene.apache.org
Betreff: Re: strange results with query and hyphened words
Hi Markus,
I
by
the WordDelimiterFilter. What about using the PatternReplaceCharFilter
at query time to eliminate all intra-word hyphens?
-Sascha
Sascha Szott wrote:
Hi Markus,
the default-config for index is:
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1
Hi Markus,
I was facing the same problem a few days ago and found an explanation in
the mail archive that clarifies my question regarding the usage of
Solr's WordDelimiterFilterFactory:
http://markmail.org/message/qoby6kneedtwd42h
Best,
Sascha
markus.rietz...@rzf.fin-nrw.de wrote:
i am
Hi Erick,
Erick Erickson wrote:
Ah, I may have misunderstood, I somehow got it in my mind
you were talking about the length of each term (as in string length).
But if you're looking at the field length as the count of terms, that's
another question, sorry for the confusion...
I have to ask,
Hi Birger,
Birger Lie wrote:
I don't think the bolean fields is mapped to on and off :)
You can use true and on interchangeably.
-Sascha
-birger
-Original Message-
From: Ilya Sterin [mailto:ster...@gmail.com]
Sent: 24. mai 2010 23:11
To: solr-user@lucene.apache.org
Subject:
Hi Erick,
Erick Erickson wrote:
Are you sure you want to recompute the length when sorting?
It's the classic time/space tradeoff, but I'd suggest that when
your index is big enough to make taking up some more space
a problem, it's far too big to spend the cycles calculating each
term length for
=onereturned response which contains
bquery/b should be bold/str
/doc
Regards
Prakash
-Original Message-
From: Sascha Szott [mailto:sz...@zib.de]
Sent: Monday, May 24, 2010 10:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Highlighting is not happening
Hi Prakash,
can you provide
1
strquery/str
strfacet/str
/arr
/requestHandler
On 2010-05-25, at 3:32 AM, Sascha Szott wrote:
Hi Birger,
Birger Lie wrote:
I don't think the bolean fields is mapped to on and off :)
You can use true and on interchangeably.
-Sascha
-birger
-Original Message-
From: Ilya
Hi folks,
is it possible to sort by field length without having to (redundantly)
save the length information in a seperate index field? At first, I
thought to accomplish this using a function query, but I couldn't find
an appropriate one.
Thanks in advance,
Sascha
Hi Prakash,
more importantly, check the field type and its associated analyzer. In
case you use a non-tokenized type (e.g., string), highlighting will
not appear if only a partial field match exists (only exact matches,
i.e. the query coincides with the field value, will be highlighted). If
Prakash
-Original Message-
From: Sascha Szott [mailto:sz...@zib.de]
Sent: Monday, May 24, 2010 10:29 PM
To: solr-user@lucene.apache.org
Subject: Re: Highlighting is not happening
Hi Prakash,
more importantly, check the field type and its associated analyzer. In
case you use a non
Hi Ilya,
Ilya Sterin wrote:
I'm trying to perform a faceted search without any luck. Result set
doesn't return any facet information...
http://localhost:8080/solr/select/?q=title:*facet=onfacet.field=title
I'm getting the result set, but no face information present? Is there
something else
Hi folks,
what's the idea behind the fact that no text analysis (e.g. lowercasing)
is performed on wildcarded search terms?
In my context this behaviour seems to be counter-intuitive (I guess
that's the case in the majority of applications) and my application
needs to lowercase any input
Hi Robert,
thanks, you're absolutely right. I should better refine my initial
question to: What's the idea behind the fact that no *lowercasing* is
performed on wildcarded search terms if the field in question contains a
LowercaseFilter in its associated field type definition?
-Sascha
Hi,
maybe you would like to have a look at solr.ShingleFilterFactory [1] to
expand your autosuggest to more than one term.
-Sascha
[1]
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
Blargy wrote:
Thanks for your help and especially your analyzer..
Hi,
I'm not sure if debugQuery=on is a feasible solution in a productive
environment, as generating such extra information requires a reasonable
amount of computation.
-Sascha
Jon Baer wrote:
Does the standard debug component (?debugQuery=on) give you what you need?
Hi Serdar,
take a look at Solr's DataImportHandler:
http://wiki.apache.org/solr/DataImportHandler
Best,
Sascha
Serdar Sahin wrote:
Hi,
I am rather new to Solr and have a question.
We have around 200.000 txt files which are placed into the file cloud.
The file path is something similar to
Hi Yonik,
Yonik Seeley wrote:
Stephen, were you running stock Solr 1.4, or did you apply any of the
SolrJ patches?
I'm trying to figure out if anyone still has any problems, or if this
was fixed with SOLR-1711:
I'm using the latest trunk version (rev. 934846) and constantly running
into the
Hi Yonik,
thanks for your fast reply.
Yonik Seeley wrote:
Thanks for the report Sascha.
So after the hang, it never recovers? Some amount of hanging could be
visible if there was a commit on the Solr server or something else to
cause the solr requests to block for a while... but it should
Hi Luca,
could you add a note to the Wiki page [1]. Thanks!
-Sascha
[1] http://wiki.apache.org/solr/SolrJBoss
Luca Molteni wrote:
Bye the way, I finally solved it.
To deploy solr 1.3 in jboss 5, you simply have to remove
xercesImpl-2.8.1.jar
xml-apis-1.3.03.jar
From the WEB-INF/lib
markus.rietz...@rzf.fin-nrw.de wrote:
ok,
i was looking for all types of max but somehow didn't saw the
maxFieldLength.
this is a global parameter, right? can this be defined on a field basis?
It's a global parameter counting the maximum number of tokens(!) - not
the number of characters
Hi,
can you post
* the output of MySQL's describe command for all tables/views referenced
in your DIH configuration
* the DIH configuration file (i.e., data-config.xml)
* the schema definition (i.e., schema.xml)
-Sascha
Jean-Michel Philippon-Nadeau wrote:
Hi,
It is my first install of
you very much.
L.M.
--
Sascha Szott
Kooperativer Bibliotheksverbund Berlin-Brandenburg (KOBV)
c/o Konrad-Zuse-Zentrum fuer Informationstechnik Berlin (ZIB)
Takustr. 7, D-14195 Berlin
Zimmer 4357
Telefon: (030) 841 85 - 457
Telefax: (030) 841 85 - 269
E-Mail: sz...@zib.de
WWW: http://www.kobv.de
with productId 220213. Since no default value is specified, Solr raises
an error when creating the index document.
-Sascha
Jean-Michel Philippon-Nadeau wrote:
Hi,
Thanks for the reply.
On Tue, 2010-02-02 at 16:57 +0100, Sascha Szott wrote:
* the output of MySQL's describe command for all tables/views
Luca Molteni wrote:
Actually, if I hard-code the value, it gives me the same error... interesting.
According to the error message:
The content of element type env-entry must match
(description?,env-entry-name,env-entry-value?,env-entry-type)
Maybe it helps to change the order of elements
).
-Sascha
[1] http://wiki.apache.org/solr/VelocityResponseWriter#line-93
[2]
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/response/QueryResponse.html
Quoting Sascha Szott sz...@zib.de:
Qiuyan,
with highlight can also be displayed in the web gui. I've added bool
name=hltrue/bool
Qiuyan,
with highlight can also be displayed in the web gui. I've added bool
name=hltrue/bool into the standard responseHandler and it already
works, i.e without velocity. But the same line doesn't take effect in
itas. Should i configure anything else? Thanks in advance.
First of all, just a
Hi,
Jay Fisher wrote:
I'm trying to find a way to formulate the following query in solrJ. This is
the only way I can get the desired result but I can't figure out how to get
solrJ to generate the same query string. It always generates a url that
starts with select and I need it to start with
, value:url1 of
res_url field is linked to value:1 of res_rank field, and all of them
are linked to the commen field keyword.
I think that i should use a custom field analyser or some thing like that;
but i don't know what to do.
but thanks for all; and any supplied help will be lovable.
Sascha Szott
Hi,
you could create an additional index field res_ranked_url that contains
the concatenated value of an url and its corresponding rank, e.g.,
res_rank + + res_url
Then, q=res_ranked_url:1 url1 retrieves all documents with url1 as the
first url.
A drawback of this
Hi Aleksander,
Aleksander Stensby wrote:
So i tried with curl:
curl http://server:8983/solr/update --data-binary 'optimize/' -H
'Content-type:text/xml; charset=utf-8'
No difference here either... Am I doing anything wrong? Do i need to issue a
commit after the optimize?
Did you restart the
Hi Rafael,
Rafael Pappert wrote:
I try to enable the spellchecker in my 1.4.0 solr (running with tomcat 6 on
debian).
But I always get the following exception, when I try to open
http://localhost:8080/spell?:
The spellcheck=true pair is missing in your request. Try
Hi Jill,
just to make sure your index contains at least one document, what is the
output of
http://localhost:8080/solr/select?q=*:*debugQuery=trueechoParams=all
Best,
Sascha
Jill Han wrote:
In fact, I just followed the instructions titled as Tomcat On Windows.
Here are the updates on my
Hi Folks,
is there any way to instruct MoreLikeThisHandler to sort results? I was
wondering that MLTHandler recognizes faceting parameters among others,
but it ignores the sort parameter.
Best,
Sascha
Pooja,
have a look at Solr's DataImportHandler. XPathEntityProcessor [1] should
suit your needs.
Best,
Sascha
[1] http://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessor
Pooja Verlani schrieb:
Hi,
I want to index an xml like following:
officer
nameJohn/name
Piero,
it sounds you're looking for an integration of Solr Cell and Solr's DIH
facility -- a feature that isn't implemented yet (but the issue is
already addressed in Solr-1358).
As a workaround, you could store the extracted contents in plain text
files (either by using Solr Cell or Apache
encoded HTML. That's it!
Best,
Sascha
Erik Hatcher schrieb:
Sascha,
Can you give me a test document that causes an issue? (maybe send me a
Solr XML document in private e-mail). I'll see what I can do once I
can see the issue first hand.
Erik
On Nov 18, 2009, at 2:48 PM, Sascha Szott
Hi,
I've played around with Solr's VelocityResponseWriter (which is indeed a
very useful feature for rapid prototyping). I've realized that Velocity
uses ISO-8859-1 as default character encoding. I've changed this setting
to UTF-8 in my velocity.properties file (inside the conf directory),
the VelocityResponseWriter returns a lot of
Unicode replacement characters (u+FFFD) instead.
-Sascha
On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote:
Hi,
I've played around with Solr's VelocityResponseWriter (which is indeed
a very useful feature for rapid prototyping). I've realized
, Sascha Szott sz...@zib.de wrote:
Hi,
the problem you've described -- an integration of DataImportHandler (to
traverse the XML file and get the document urls) and Solr Cell (to extract
content afterwards) -- is already addressed in issue SOLR-1358 (
https://issues.apache.org/jira/browse/SOLR-1358
Hi,
the problem you've described -- an integration of DataImportHandler (to
traverse the XML file and get the document urls) and Solr Cell (to
extract content afterwards) -- is already addressed in issue SOLR-1358
(https://issues.apache.org/jira/browse/SOLR-1358).
Best,
Sascha
Kerwin
Noble Paul wrote:
Yes , open an issue . This is a trivial change
I've opened JIRA issue SOLR-1554.
-Sascha
On Thu, Nov 12, 2009 at 5:08 AM, Sascha Szott sz...@zib.de wrote:
Noble,
Noble Paul wrote:
DIH imports are really long running. There is a good chance that the
connection times out
capabilities, though issue SOLR-1352 mainly targets the latter. Is
your PDIH implementation able to deal with batch processing right now?
Best,
Sascha
On Thu, Nov 12, 2009 at 6:35 AM, Sascha Szott sz...@zib.de wrote:
Hi all,
I'm using the DIH in a parameterized way by passing request parameters
on adding a callback url to
DIH a month ago, but it seems that no issue was raised. So, up to now its
only possible to implement an appropriate Solr EventListener. Should we
open an issue for supporting callback urls?
Best,
Sascha
On Tue, Nov 10, 2009 at 12:12 AM, Sascha Szott sz...@zib.de wrote
Hi all,
I'm using the DIH in a parameterized way by passing request parameters
that are used inside of my data-config. All imports end up in the same
index.
1. Is it considered as good practice to set up several DIH request
handlers, one for each possible parameter value?
2. In case the range
Hi all,
as stated in the Solr-WIKI, Solr 1.4 allows it to specify an onError
attribute for *each* entity listed in the data config file (it is
considered as one of the default attributes).
Unfortunately, the SqlEntityProcessor does not recognize the attribute's
value -- i.e., in case an SQL
Hi,
Noble Paul നോബിള് नोब्ळ् wrote:
On Mon, Nov 9, 2009 at 4:24 PM, Sascha Szott sz...@zib.de wrote:
Hi all,
as stated in the Solr-WIKI, Solr 1.4 allows it to specify an onError
attribute for *each* entity listed in the data config file (it is considered
as one of the default attributes
Hi all,
currently, DIH's import operation(s) only works asynchronously.
Therefore, after submitting an import request, DIH returns immediately,
while the import process (in case a large amount of data needs to be
indexed) continues asynchronously behind the scenes.
So, what is the
Hi Khai,
a few weeks ago, I was facing the same problem.
In my case, this workaround helped (assuming, you're using Solr 1.3):
For each row, extract the content from the corresponding pdf file using
a parser library of your choice (I suggest Apache PDFBox or Apache Tika
in case you need to
Hello,
is it possible (and if it is, how can I accomplish it) to configure DIH
to build up index documents by using content that resides in different
data sources?
Here is an example scenario:
Let's assume we have a table T with two columns, ID (which is the
primary key of T) and TITLE.
Hi Noble,
Noble Paul wrote:
isn't it possible to do this by having two datasources (one Js=dbc and
another File) and two entities . The outer entity can read from a DB
and the inner entity can read from a file.
Yes, it is. Here's my db-data-config.xml file:
!-- definition of data sources --
85 matches
Mail list logo