Von: Tomas Zerolo
There can be transformations or inflections, like the s in
Weinachtsbaum (Weinachten/Baum).
I remember from my linguistics studies that the terminus technicus
for these is Fugenmorphem (interstitial or joint morpheme) [...]
IANAL (I am not a linguist -- pun
Given an input of Windjacke (probably wind jacket in English), I'd
like the code that prepares the data for the index (tokenizer etc) to
understand that this is a Jacke (jacket) so that a query for Jacke
would include the Windjacke document in its result set.
It appears to me that such an
Given an input of Windjacke (probably wind jacket in English),
I'd like the code that prepares the data for the index (tokenizer
etc) to understand that this is a Jacke (jacket) so that a
query for Jacke would include the Windjacke document in its
result set.
It appears to me that such an
. 2012 à 11:52, Michael Ludwig a écrit :
Given an input of Windjacke (probably wind jacket in English),
I'd like the code that prepares the data for the index (tokenizer
etc) to understand that this is a Jacke (jacket) so that a
query for Jacke would include the Windjacke document in its
Von: Markus Jelsma
We've done a lot of tests with the HyphenationCompoundWordTokenFilter
using a from TeX generated FOP XML file for the Dutch language and
have seen decent results. A bonus was that now some tokens can be
stemmed properly because not all compounds are listed in the
Von: Walter Underwood
German noun decompounding is a little more complicated than it might
seem.
There can be transformations or inflections, like the s in
Weinachtsbaum (Weinachten/Baum).
I remember from my linguistics studies that the terminus technicus for
these is Fugenmorphem
the inclusion of a stopword list result in stopwords being of
top importance in the MoreLikeThis query?
Michael Ludwig
is happens?
Hi Akinori,
I guess you're using the DisMax query parser. Please read this entire
page: http://wiki.apache.org/solr/DisMaxRequestHandler
The parameter that allows you to tweak this is the mm parameter.
Michael Ludwig
Koji Sekiguchi schrieb:
I'm not a Windows user, but I think you can use Linux command (e.g.
patch, to apply SOLR-284 patch to Solr nightly build) on cygwin
environment.
The standalone patch utility for Win32 is another option.
http://gnuwin32.sourceforge.net/packages/patch.htm
Michael Ludwig
Gurjot Singh schrieb:
Hi,
Is there a way to monitor the number of search queries made on the
solr index.
http://localhost:8983/solr/admin/stats.jsp
Look for requests :.
Michael Ludwig
Radha C. schrieb:
The feature spelling suggestion is available in solr? If yes, can
you tell me some documentations?
Have you tried googling for: solr spelling ? First hit:
http://wiki.apache.org/solr/SpellCheckComponent
Michael Ludwig
- Michael Ludwig
http://markmail.org/thread/dgi4llhc7x5wuroc
(BTW, the patch in SOLR-1204 is ready but still awaiting clarification.
See comments from June 11 and 18.)
My Config is :
spellcheck = 'true';
spellcheck.dictionary = 'jarowinkler'
spellcheck.onlyMorePopular = 'true'
spellcheck.build = 'false
.
Exactly.
Could anyone navigate me?
Go to your analysis page, enter your field name (or type), check
verbose output, enter your query, and press Analyze.
http://localhost:8983/solr/admin/analysis.jsp
You'll probably find that the word for is removed as a so-called
stopword.
Michael Ludwig
. See:
filterCache/@size, queryResultCache/@size, documentCache/@size
http://markmail.org/thread/tb6aanicpt43okcm
Michael Ludwig
to think that drop-down boxes
(the values of which you control) are a nice match for the filter query,
whereas user-entered text is more likely to be a candidate for the main
query.
Michael Ludwig
MilkDud schrieb:
Michael Ludwig-4 wrote:
What do you expect the user to enter?
* dream theater innocence faded - certainly wrong
* dream theater innocence faded - much better
Most likely they would just enter dream theater innocence faded, no
quotes. Without any quotes around any fields
-valued
track - every song or whatever, definitely multi-valued
Read up about multi-valued fields (sample schema.xml, for example, or
Google) if you're unsure what this is; your posting subject, however,
suggests you aren't.
Regards,
Michael Ludwig
!
Imagine it did one day!
Michael Ludwig
://issues.apache.org/jira/browse/SOLR-475
Michael Ludwig
For SolrJ, see this thread:
Using SolrJ with multicore/shards - ahammad
http://markmail.org/thread/qnytfrk4dytmgjis
if so, isnt there a better way to do that?
No idea.
Michael Ludwig
Rakhi Khatwani schrieb:
On Thu, Jun 18, 2009 at 3:51 PM, Michael Ludwig m...@as-guides.com
wrote:
I don't know how we're supposed to use it. I did the following:
http://flunder:8983/solr/xpg/select?q=blashards=flunder:8983/solr/xpg,flunder:8983/solr/kk
i am gettin a page load error
is nothing but unique key =
1001?
Yes, it is: q=id:1001
(1) Don't use DisMax here, that will not interpret field names.
(2) Replace id by whatever name you gave to your unique key field.
Michael Ludwig
the DisMaxRequestHandler and specify all fields you want to use in
your query in the qf parameter.
!-- qf = query fields: list of fields with boost factor --
str name=qf artist^3 album^2 track^1 /str
http://wiki.apache.org/solr/DisMaxRequestHandler
Michael Ludwig
what most people do, though nothing prevents the indexing
client from sending the same doc to multiple shards. In some
scenarios that's exactly what you want to do.
What kind of scenario would that be?
Michael Ludwig
--
A: Because it messes up the order in which people normally read text.
Q: Why
for something like solr
date range query. For example, see:
http://www.nabble.com/Date-Range-Query-%2B-Fields-to16108517.html
Michael Ludwig
://wiki.apache.org/solr/CoreAdmin
Michael Ludwig
I'd attribute that to the mm (minimum match) parameter, the meaning
of which you can understand reading the following page, which it would
probably make a lot of sense to read anyway:
http://wiki.apache.org/solr/DisMaxRequestHandler
Michael Ludwig
within a single field.
I added the comment in that I think that a wiki page discussing fs vs
q should also mention facet.query.
It now does: http://wiki.apache.org/solr/FilterQueryGuidance
Michael Ludwig
, definitely multi-valued
Michael Ludwig
and
update the indexes. is it possible to send the differences only
into shard 3 and then merge it at shard 3?
My (very limited) understanding of shards is that you repartition
your documents among shards and send each document to only one
shard. (Not sure this is correct.)
Michael Ludwig
perfect sense to store dates and times
in integers, depending on your use case and your client.
Michael Ludwig
reduced from their actual continuum
of values to three ranges {A,B,C}, you'd have to define three
facet.query parameters accordingly. A mere facet.field, on the
other hand, creates as many filters as there are unique values in
the field. Is that correct?
Michael Ludwig
Shalin Shekhar Mangar schrieb:
On Mon, Jun 15, 2009 at 4:39 PM, Michael Ludwig m...@as-guides.com
wrote:
I think if you truncate dates to incomplete dates, you effectively
also lose all the date logic. You may still apply it, but what would
you take the result to mean? You can't regain
regular graph, then the notion of a
main item needs clarification.
Michael Ludwig
Michael Ludwig schrieb:
Martin Davidsson schrieb:
I've tried to read up on how to decide, when writing a query, what
criteria goes in the q parameter and what goes in the fq parameter,
to achieve optimal performance. Is there [...] some kind of rule of
thumb to help me decide how to split
be
overkill for your particular situation.
Michael Ludwig
in question, but I can't seem to
find the issue. Any suggestions?
Run: ant -verbose
Michael Ludwig
that the DisMaxRequestHandler is simply the standard request
handler with the default query parser set to the DisMax Query Parser.
So maybe you could program your own CustomDisMaxRequestHandler that
reuses the DisMax query parser (and probably other components) to
achieve what you want.
Michael Ludwig
should be part of your installation, or can be found on the web.
Quick overview:
ant -help
When I wrote ant -verbose, I meant ant -verbose your-target, so:
ant -verbose example
Michael Ludwig
used to analyze
the data in order to determine clusters, if I understand correctly.
Michael Ludwig
Fergus McMenemie schrieb:
On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig m...@as-guides.com
wrote:
A filter query is cached, which means that it is the more useful
the more often it is repeated. We know how often certain queries
arise, or at least have the means to collect that data - so we
Yonik Seeley schrieb:
Yep, all that sounds right.
An additional optimization counts terms for the documents *not* in the
set when the base set is over half the size of the index.
Cool :-) Thanks for confirming my assumptions!
Michael Ludwig
to be determined. Is that a correct assessment?
Michael Ludwig
=locationLubang, Philippinen/str
If you control how the client works, you could also consider using an
internationalization technology such as GNU Gettext for this purpose.
May or may not make sense in your particular situation.
Michael Ludwig
address.
Michael Ludwig
* flo...@name='score'] div ../@maxScore)/
The div is the XPath division operator. Should be a straightforward
mapping to any other language.
Michael Ludwig
ashokc schrieb:
Do I have to declare 'field1' also to be stored? 'field1' is never
returned in the response.
I find the following Wiki page helpful when dealing with @stored,
@indexed and friends:
http://wiki.apache.org/solr/FieldOptionsByUseCase
Michael Ludwig
the application to apply filtering
by category, incidentally, using faceting, which is a typical usage
pattern, I guess.
Michael Ludwig
?
Michael Ludwig
expensive.
Michael Ludwig
percent (YMMV), it
might not be worth the effort.
Michael Ludwig
process based on top N (say 100) hits for this but it is my last
option.
Also a very interesting data mining question! I'm sorry I don't have any
answers for you. Maybe someone else does.
Best,
Michael Ludwig
), and (b) collecting all the
pesky little terms from the new structure mapping documents to term
numbers?
So basically, depending on expediency, you (a) know the facets and count
the documents which display them, or you (b) take the documents and see
what facets they have?
Michael Ludwig
, the number of your search
terms, and the number of your facets.
I assume this is an expensive operation.
Michael Ludwig
Shalin Shekhar Mangar schrieb:
On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig m...@as-guides.com
wrote:
A filter query should probably be orthogonal to the primary query,
which means in plain English: unrelated to the primary query. To give
an example, I have a field category, which
Shalin Shekhar Mangar schrieb:
On Tue, Jun 9, 2009 at 7:47 PM, Michael Ludwig m...@as-guides.com
wrote:
Given the following three filtering scenarios of (a) x:bla, (b)
y:blub, and (c) x:bla AND y:blub, will I end up with two or three
distinct filters? In other words, may filters be composites
Shalin Shekhar Mangar schrieb:
No, both filters and queries are computed on the entire index.
My comment was related to the A filter query should probably be
orthogonal to the primary query... part. I meant that both kinds of
use-cases are common.
Got it. Thanks :-)
Michael Ludwig
and if possible, give a patch?
Please see: https://issues.apache.org/jira/browse/SOLR-1204
Regards,
Michael Ludwig
to the Introduction of:
http://wiki.apache.org/solr/SpellCheckComponent
Michael Ludwig
.
IMHO, a name conveying the actual meaning, along the lines of
suggest, would make more sense.
Michael Ludwig
out in the thread referred to above, it seems you want to
use the spellcheck.q parameter for anything but what can
be encoded in ASCII. Is that true?
Michael Ludwig
Shalin Shekhar Mangar schrieb:
On Mon, May 11, 2009 at 2:46 PM, Michael Ludwig m...@as-guides.com
wrote:
Could you give an example of how the spellcheck.q parameter can be
brought into play to (take non-ASCII characters into account, so
that Käse isn't mishandled) given the following example
]+:-)
Michael Ludwig
Urmel [
!ENTITY egpe_from_the_net
SYSTEM http://lobster.as-guides.com/ds/solr.schema.ent;
!ENTITY egpe_from_the_local_disk
SYSTEM egpe-local.ent
]
Urmel
egpe_from_the_net;
egpe_from_the_local_disk;
/Urmel
C:\MILU\dev\XML # type egpe-local.ent
eins/
zwei/
drei/
Michael Ludwig
a Solr/Lucene newbie, this approach might have a
disadvantage that escapes me, which is why other people haven't made
this particular suggestion. If so, I'd be happy to learn why this isn't
preferable.
Michael Ludwig
ignorance of the 'ineluctable filter query' and will have
to read up on that one.
I meant a filter query that the application tags onto the query on
behalf of the user and without the user being able to do anything about
it so he cannot circumvent the filter.
Best regards,
Michael Ludwig
the result of the above, which is plain wrong, reads:
[(k,0,1,type=ALPHANUM), (se,2,4,type=ALPHANUM)]
Thanks.
Michael Ludwig
overlaps and hence redundancy?
Michael Ludwig
encoding not getting supported by Solr.
Did you make sure to not rely on your platform default encoding
(Charset) when constructing the InputStreamReader? If in doubt, take
a look at the InputStreamReader constructors.
Michael Ludwig
Matt Weber schrieb:
http://wiki.apache.org/solr/MultipleIndexes
Thanks, Mark. Your explanation and the pointer to the Wiki have
clarified things for me.
Michael Ludwig
Otis Gospodnetic schrieb:
Attribute values for fields should be inherited from attribute values
of their field types.
Thanks, that answers my question pertaining to @indexed and @stored in
the fieldtype and field elements in schema.xml.
Michael Ludwig
in the tutorial
and run Solr in Jetty as per the distribution, which works out of the
box:
http://lucene.apache.org/solr/tutorial.html
Michael Ludwig
.
Or even do a string replacement s/8983/8080/g on the Solr doc you're
viewing.
Michael Ludwig
uday kumar maddigatla schrieb:
My intention is to use 8080 as port.
Is there any other way taht Solr will post the files in 8080 port
Solr doesn't post, it listens.
Use the curl utility as indicated in the documentation.
http://wiki.apache.org/solr/UpdateXmlMessages
Michael Ludwig
dream up.
Seriously, read the docs, it'll help you :-)
Michael Ludwig
that I could limit my search to, as per Otis' post?
(4) And is that what's called a core here?
(5) Or, failing (3), and lumping everything together in one search
domain (core?), would I use that type field to limit my search to
a particular type of data?
Michael Ludwig
?
Michael Ludwig
encode, decode, newEncoder,
newDecoder.
Michael Ludwig
doc
field name=id1001/field
field name=titleBMP plus 1 #x1;/field
/doc
/add
Maybe the test script output says that such characters cannot be used
for querying. Hardly relevant if you consider that the BMP comprises
even languages such as Telugu, Bopomofo and French.
Best,
Michael
profiling for your specific scenario.
The rule of thumb here is probably: Get what you need.
Michael Ludwig
) 164, 's', 'e' };
System.out.println(Charset.defaultCharset().displayName());
System.out.println(new String(bytes));
System.out.println(new String(bytes, Charset.forName(UTF-8)));
}
}
Output:
windows-1252
Käse (bad)
Käse (good)
Michael Ludwig
over strings,
I rather want something like this:
strbEumel/b NDR Ländermagazine/str
There could be a parameter hl.xml which I could use to request
modified XML like this:
hl.xlm=em
hl.xlm=b
This would allow smoother processing technologies like XSLT.
Is such a feature available?
Michael
82 matches
Mail list logo