AW: Lexical analysis tools for German language data

2012-04-13 Thread Michael Ludwig
Von: Tomas Zerolo There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). I remember from my linguistics studies that the terminus technicus for these is Fugenmorphem (interstitial or joint morpheme) [...] IANAL (I am not a linguist -- pun

Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its result set. It appears to me that such an

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
. 2012 à 11:52, Michael Ludwig a écrit : Given an input of Windjacke (probably wind jacket in English), I'd like the code that prepares the data for the index (tokenizer etc) to understand that this is a Jacke (jacket) so that a query for Jacke would include the Windjacke document in its

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Von: Markus Jelsma We've done a lot of tests with the HyphenationCompoundWordTokenFilter using a from TeX generated FOP XML file for the Dutch language and have seen decent results. A bonus was that now some tokens can be stemmed properly because not all compounds are listed in the

AW: Lexical analysis tools for German language data

2012-04-12 Thread Michael Ludwig
Von: Walter Underwood German noun decompounding is a little more complicated than it might seem. There can be transformations or inflections, like the s in Weinachtsbaum (Weinachten/Baum). I remember from my linguistics studies that the terminus technicus for these is Fugenmorphem

Re: Implementing PhraseQuery and MoreLikeThis Query in one app

2009-07-02 Thread Michael Ludwig
the inclusion of a stopword list result in stopwords being of top importance in the MoreLikeThis query? Michael Ludwig

Re: Search for phrase including prepositions

2009-07-01 Thread Michael Ludwig
is happens? Hi Akinori, I guess you're using the DisMax query parser. Please read this entire page: http://wiki.apache.org/solr/DisMaxRequestHandler The parameter that allows you to tweak this is the mm parameter. Michael Ludwig

Re: Installing a patch in a solr nightly on Windows

2009-07-01 Thread Michael Ludwig
Koji Sekiguchi schrieb: I'm not a Windows user, but I think you can use Linux command (e.g. patch, to apply SOLR-284 patch to Solr nightly build) on cygwin environment. The standalone patch utility for Win32 is another option. http://gnuwin32.sourceforge.net/packages/patch.htm Michael Ludwig

Re: Monitor search traffic

2009-07-01 Thread Michael Ludwig
Gurjot Singh schrieb: Hi, Is there a way to monitor the number of search queries made on the solr index. http://localhost:8983/solr/admin/stats.jsp Look for requests :. Michael Ludwig

Re: spelling suggestion in solr.

2009-06-30 Thread Michael Ludwig
Radha C. schrieb: The feature spelling suggestion is available in solr? If yes, can you tell me some documentations? Have you tried googling for: solr spelling ? First hit: http://wiki.apache.org/solr/SpellCheckComponent Michael Ludwig

Re: SOLR SpeelChecker and german Umlauts

2009-06-30 Thread Michael Ludwig
- Michael Ludwig http://markmail.org/thread/dgi4llhc7x5wuroc (BTW, the patch in SOLR-1204 is ready but still awaiting clarification. See comments from June 11 and 18.) My Config is : spellcheck = 'true'; spellcheck.dictionary = 'jarowinkler' spellcheck.onlyMorePopular = 'true' spellcheck.build = 'false

Re: Search for phrase including prepositions

2009-06-30 Thread Michael Ludwig
. Exactly. Could anyone navigate me? Go to your analysis page, enter your field name (or type), check verbose output, enter your query, and press Analyze. http://localhost:8983/solr/admin/analysis.jsp You'll probably find that the word for is removed as a so-called stopword. Michael Ludwig

Re: nested dismax queries

2009-06-29 Thread Michael Ludwig
. See: filterCache/@size, queryResultCache/@size, documentCache/@size http://markmail.org/thread/tb6aanicpt43okcm Michael Ludwig

Re: nested dismax queries

2009-06-29 Thread Michael Ludwig
to think that drop-down boxes (the values of which you control) are a nice match for the filter query, whereas user-entered text is more likely to be a candidate for the main query. Michael Ludwig

Re: Searching across multivalued fields

2009-06-19 Thread Michael Ludwig
MilkDud schrieb: Michael Ludwig-4 wrote: What do you expect the user to enter? * dream theater innocence faded - certainly wrong * dream theater innocence faded - much better Most likely they would just enter dream theater innocence faded, no quotes. Without any quotes around any fields

Re: Searching across multivalued fields

2009-06-18 Thread Michael Ludwig
-valued track - every song or whatever, definitely multi-valued Read up about multi-valued fields (sample schema.xml, for example, or Google) if you're unsure what this is; your posting subject, however, suggests you aren't. Regards, Michael Ludwig

Re: Few Queries regarding indexes in Solr

2009-06-18 Thread Michael Ludwig
! Imagine it did one day! Michael Ludwig

Re: FilterCache issue

2009-06-18 Thread Michael Ludwig
://issues.apache.org/jira/browse/SOLR-475 Michael Ludwig

Re: Distributed querying using solr multicore.

2009-06-18 Thread Michael Ludwig
For SolrJ, see this thread: Using SolrJ with multicore/shards - ahammad http://markmail.org/thread/qnytfrk4dytmgjis if so, isnt there a better way to do that? No idea. Michael Ludwig

Re: Distributed querying using solr multicore.

2009-06-18 Thread Michael Ludwig
Rakhi Khatwani schrieb: On Thu, Jun 18, 2009 at 3:51 PM, Michael Ludwig m...@as-guides.com wrote: I don't know how we're supposed to use it. I did the following: http://flunder:8983/solr/xpg/select?q=blashards=flunder:8983/solr/xpg,flunder:8983/solr/kk i am gettin a page load error

Re: Searching across multivalued fields

2009-06-18 Thread Michael Ludwig
is nothing but unique key = 1001? Yes, it is: q=id:1001 (1) Don't use DisMax here, that will not interpret field names. (2) Replace id by whatever name you gave to your unique key field. Michael Ludwig

Re: Searching across multivalued fields

2009-06-17 Thread Michael Ludwig
the DisMaxRequestHandler and specify all fields you want to use in your query in the qf parameter. !-- qf = query fields: list of fields with boost factor -- str name=qf artist^3 album^2 track^1 /str http://wiki.apache.org/solr/DisMaxRequestHandler Michael Ludwig

Re: Few Queries regarding indexes in Solr

2009-06-17 Thread Michael Ludwig
what most people do, though nothing prevents the indexing client from sending the same doc to multiple shards. In some scenarios that's exactly what you want to do. What kind of scenario would that be? Michael Ludwig -- A: Because it messes up the order in which people normally read text. Q: Why

Re: what date format to pass for search in Solr?

2009-06-17 Thread Michael Ludwig
for something like solr date range query. For example, see: http://www.nabble.com/Date-Range-Query-%2B-Fields-to16108517.html Michael Ludwig

Re: Could solr build two different indexes?

2009-06-17 Thread Michael Ludwig
://wiki.apache.org/solr/CoreAdmin Michael Ludwig

Re: Solr Query | Field:value with dismaxquery

2009-06-17 Thread Michael Ludwig
I'd attribute that to the mm (minimum match) parameter, the meaning of which you can understand reading the following page, which it would probably make a lot of sense to read anyway: http://wiki.apache.org/solr/DisMaxRequestHandler Michael Ludwig

Re: fq vs. q

2009-06-17 Thread Michael Ludwig
within a single field. I added the comment in that I think that a wiki page discussing fs vs q should also mention facet.query. It now does: http://wiki.apache.org/solr/FilterQueryGuidance Michael Ludwig

Re: Searching across multivalued fields

2009-06-17 Thread Michael Ludwig
, definitely multi-valued Michael Ludwig

Re: Few Queries regarding indexes in Solr

2009-06-16 Thread Michael Ludwig
and update the indexes. is it possible to send the differences only into shard 3 and then merge it at shard 3? My (very limited) understanding of shards is that you repartition your documents among shards and send each document to only one shard. (Not sure this is correct.) Michael Ludwig

Re: fq vs. q

2009-06-15 Thread Michael Ludwig
perfect sense to store dates and times in integers, depending on your use case and your client. Michael Ludwig

Re: fq vs. q

2009-06-15 Thread Michael Ludwig
reduced from their actual continuum of values to three ranges {A,B,C}, you'd have to define three facet.query parameters accordingly. A mere facet.field, on the other hand, creates as many filters as there are unique values in the field. Is that correct? Michael Ludwig

Re: fq vs. q

2009-06-15 Thread Michael Ludwig
Shalin Shekhar Mangar schrieb: On Mon, Jun 15, 2009 at 4:39 PM, Michael Ludwig m...@as-guides.com wrote: I think if you truncate dates to incomplete dates, you effectively also lose all the date logic. You may still apply it, but what would you take the result to mean? You can't regain

Re: Joins or subselects in solr

2009-06-15 Thread Michael Ludwig
regular graph, then the notion of a main item needs clarification. Michael Ludwig

Re: fq vs. q

2009-06-12 Thread Michael Ludwig
Michael Ludwig schrieb: Martin Davidsson schrieb: I've tried to read up on how to decide, when writing a query, what criteria goes in the q parameter and what goes in the fq parameter, to achieve optimal performance. Is there [...] some kind of rule of thumb to help me decide how to split

Re: Customizing results

2009-06-11 Thread Michael Ludwig
be overkill for your particular situation. Michael Ludwig

Re: Build Failed

2009-06-11 Thread Michael Ludwig
in question, but I can't seem to find the issue. Any suggestions? Run: ant -verbose Michael Ludwig

Re: dismax parsing applied to specific fields

2009-06-11 Thread Michael Ludwig
that the DisMaxRequestHandler is simply the standard request handler with the default query parser set to the DisMax Query Parser. So maybe you could program your own CustomDisMaxRequestHandler that reuses the DisMax query parser (and probably other components) to achieve what you want. Michael Ludwig

Re: Build Failed

2009-06-11 Thread Michael Ludwig
should be part of your installation, or can be found on the web. Quick overview: ant -help When I wrote ant -verbose, I meant ant -verbose your-target, so: ant -verbose example Michael Ludwig

Re: Faceting on text fields

2009-06-11 Thread Michael Ludwig
used to analyze the data in order to determine clusters, if I understand correctly. Michael Ludwig

Re: fq vs. q

2009-06-10 Thread Michael Ludwig
Fergus McMenemie schrieb: On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig m...@as-guides.com wrote: A filter query is cached, which means that it is the more useful the more often it is repeated. We know how often certain queries arise, or at least have the means to collect that data - so we

Re: Faceting on text fields

2009-06-10 Thread Michael Ludwig
Yonik Seeley schrieb: Yep, all that sounds right. An additional optimization counts terms for the documents *not* in the set when the base set is over half the size of the index. Cool :-) Thanks for confirming my assumptions! Michael Ludwig

Re: Faceting on text fields

2009-06-10 Thread Michael Ludwig
to be determined. Is that a correct assessment? Michael Ludwig

Re: Customizing results

2009-06-10 Thread Michael Ludwig
=locationLubang, Philippinen/str If you control how the client works, you could also consider using an internationalization technology such as GNU Gettext for this purpose. May or may not make sense in your particular situation. Michael Ludwig

Re: How to disable posting updates from a remote server

2009-06-10 Thread Michael Ludwig
address. Michael Ludwig

Re: Solr relevancy score - conversion

2009-06-10 Thread Michael Ludwig
* flo...@name='score'] div ../@maxScore)/ The div is the XPath division operator. Should be a straightforward mapping to any other language. Michael Ludwig

Re: copyfield and 'store' and highlighting

2009-06-10 Thread Michael Ludwig
ashokc schrieb: Do I have to declare 'field1' also to be stored? 'field1' is never returned in the response. I find the following Wiki page helpful when dealing with @stored, @indexed and friends: http://wiki.apache.org/solr/FieldOptionsByUseCase Michael Ludwig

Re: fq vs. q

2009-06-09 Thread Michael Ludwig
the application to apply filtering by category, incidentally, using faceting, which is a typical usage pattern, I guess. Michael Ludwig

filterCache/@size, queryResultCache/@size, documentCache/@size

2009-06-09 Thread Michael Ludwig
? Michael Ludwig

Re: filter on millions of IDs from external query

2009-06-09 Thread Michael Ludwig
expensive. Michael Ludwig

Re: Field Compression

2009-06-09 Thread Michael Ludwig
percent (YMMV), it might not be worth the effort. Michael Ludwig

Re: Faceting on text fields

2009-06-09 Thread Michael Ludwig
process based on top N (say 100) hits for this but it is my last option. Also a very interesting data mining question! I'm sorry I don't have any answers for you. Maybe someone else does. Best, Michael Ludwig

Re: Faceting on text fields

2009-06-09 Thread Michael Ludwig
), and (b) collecting all the pesky little terms from the new structure mapping documents to term numbers? So basically, depending on expediency, you (a) know the facets and count the documents which display them, or you (b) take the documents and see what facets they have? Michael Ludwig

Re: statistics about word distances in solr

2009-06-09 Thread Michael Ludwig
, the number of your search terms, and the number of your facets. I assume this is an expensive operation. Michael Ludwig

Re: fq vs. q

2009-06-09 Thread Michael Ludwig
Shalin Shekhar Mangar schrieb: On Tue, Jun 9, 2009 at 7:25 PM, Michael Ludwig m...@as-guides.com wrote: A filter query should probably be orthogonal to the primary query, which means in plain English: unrelated to the primary query. To give an example, I have a field category, which

Re: filterCache/@size, queryResultCache/@size, documentCache/@size

2009-06-09 Thread Michael Ludwig
Shalin Shekhar Mangar schrieb: On Tue, Jun 9, 2009 at 7:47 PM, Michael Ludwig m...@as-guides.com wrote: Given the following three filtering scenarios of (a) x:bla, (b) y:blub, and (c) x:bla AND y:blub, will I end up with two or three distinct filters? In other words, may filters be composites

Re: fq vs. q

2009-06-09 Thread Michael Ludwig
Shalin Shekhar Mangar schrieb: No, both filters and queries are computed on the entire index. My comment was related to the A filter query should probably be orthogonal to the primary query... part. I meant that both kinds of use-cases are common. Got it. Thanks :-) Michael Ludwig

Re: SpellCheckComponent: queryAnalyzerFieldType

2009-06-05 Thread Michael Ludwig
and if possible, give a patch? Please see: https://issues.apache.org/jira/browse/SOLR-1204 Regards, Michael Ludwig

Re: spell checking

2009-06-05 Thread Michael Ludwig
to the Introduction of: http://wiki.apache.org/solr/SpellCheckComponent Michael Ludwig

Re: spell checking

2009-06-04 Thread Michael Ludwig
. IMHO, a name conveying the actual meaning, along the lines of suggest, would make more sense. Michael Ludwig

SpellCheckComponent: queryAnalyzerFieldType

2009-06-04 Thread Michael Ludwig
out in the thread referred to above, it seems you want to use the spellcheck.q parameter for anything but what can be encoded in ASCII. Is that true? Michael Ludwig

Re: French and SpellingQueryConverter

2009-05-19 Thread Michael Ludwig
Shalin Shekhar Mangar schrieb: On Mon, May 11, 2009 at 2:46 PM, Michael Ludwig m...@as-guides.com wrote: Could you give an example of how the spellcheck.q parameter can be brought into play to (take non-ASCII characters into account, so that Käse isn't mishandled) given the following example

Re: French and SpellingQueryConverter

2009-05-19 Thread Michael Ludwig
]+:-) Michael Ludwig

Re: Replication master+slave

2009-05-15 Thread Michael Ludwig
Urmel [ !ENTITY egpe_from_the_net SYSTEM http://lobster.as-guides.com/ds/solr.schema.ent; !ENTITY egpe_from_the_local_disk SYSTEM egpe-local.ent ] Urmel egpe_from_the_net; egpe_from_the_local_disk; /Urmel C:\MILU\dev\XML # type egpe-local.ent eins/ zwei/ drei/ Michael Ludwig

Re: Selective Searches Based on User Identity

2009-05-13 Thread Michael Ludwig
a Solr/Lucene newbie, this approach might have a disadvantage that escapes me, which is why other people haven't made this particular suggestion. If so, I'd be happy to learn why this isn't preferable. Michael Ludwig

Re: Selective Searches Based on User Identity

2009-05-13 Thread Michael Ludwig
ignorance of the 'ineluctable filter query' and will have to read up on that one. I meant a filter query that the application tags onto the query on behalf of the user and without the user being able to do anything about it so he cannot circumvent the filter. Best regards, Michael Ludwig

Re: French and SpellingQueryConverter

2009-05-11 Thread Michael Ludwig
the result of the above, which is plain wrong, reads: [(k,0,1,type=ALPHANUM), (se,2,4,type=ALPHANUM)] Thanks. Michael Ludwig

Organizing multiple searchers around overlapping subsets of data

2009-05-08 Thread Michael Ludwig
overlaps and hence redundancy? Michael Ludwig

Re: What are the Unicode encodings supported by Solr?

2009-05-08 Thread Michael Ludwig
encoding not getting supported by Solr. Did you make sure to not rely on your platform default encoding (Charset) when constructing the InputStreamReader? If in doubt, take a look at the InputStreamReader constructors. Michael Ludwig

Re: Multi-index Design

2009-05-06 Thread Michael Ludwig
Matt Weber schrieb: http://wiki.apache.org/solr/MultipleIndexes Thanks, Mark. Your explanation and the pointer to the Wiki have clarified things for me. Michael Ludwig

Re: schema.xml: default values for @indexed and @stored

2009-05-06 Thread Michael Ludwig
Otis Gospodnetic schrieb: Attribute values for fields should be inherited from attribute values of their field types. Thanks, that answers my question pertaining to @indexed and @stored in the fieldtype and field elements in schema.xml. Michael Ludwig

Re: unable to run the solr in tomcat 5.0

2009-05-06 Thread Michael Ludwig
in the tutorial and run Solr in Jetty as per the distribution, which works out of the box: http://lucene.apache.org/solr/tutorial.html Michael Ludwig

Re: unable to run the solr in tomcat 5.0

2009-05-06 Thread Michael Ludwig
. Or even do a string replacement s/8983/8080/g on the Solr doc you're viewing. Michael Ludwig

Re: unable to run the solr in tomcat 5.0

2009-05-06 Thread Michael Ludwig
uday kumar maddigatla schrieb: My intention is to use 8080 as port. Is there any other way taht Solr will post the files in 8080 port Solr doesn't post, it listens. Use the curl utility as indicated in the documentation. http://wiki.apache.org/solr/UpdateXmlMessages Michael Ludwig

Re: unable to run the solr in tomcat 5.0

2009-05-06 Thread Michael Ludwig
dream up. Seriously, read the docs, it'll help you :-) Michael Ludwig

Re: Multi-index Design

2009-05-05 Thread Michael Ludwig
that I could limit my search to, as per Otis' post? (4) And is that what's called a core here? (5) Or, failing (3), and lumping everything together in one search domain (core?), would I use that type field to limit my search to a particular type of data? Michael Ludwig

schema.xml: default values for @indexed and @stored

2009-05-04 Thread Michael Ludwig
? Michael Ludwig

Re: Problem adding unicoded docs to Solr through SolrJ

2009-04-30 Thread Michael Ludwig
encode, decode, newEncoder, newDecoder. Michael Ludwig

Re: UTF8 compatibility

2009-04-29 Thread Michael Ludwig
doc field name=id1001/field field name=titleBMP plus 1 #x1;/field /doc /add Maybe the test script output says that such characters cannot be used for querying. Hardly relevant if you consider that the BMP comprises even languages such as Telugu, Bopomofo and French. Best, Michael

Re: Performance and number of search results

2009-04-29 Thread Michael Ludwig
profiling for your specific scenario. The rule of thumb here is probably: Get what you need. Michael Ludwig

Re: Problem adding unicoded docs to Solr through SolrJ

2009-04-29 Thread Michael Ludwig
) 164, 's', 'e' }; System.out.println(Charset.defaultCharset().displayName()); System.out.println(new String(bytes)); System.out.println(new String(bytes, Charset.forName(UTF-8))); } } Output: windows-1252 Käse (bad) Käse (good) Michael Ludwig

Highlighting using XML instead of strings?

2009-04-29 Thread Michael Ludwig
over strings, I rather want something like this: strbEumel/b NDR Ländermagazine/str There could be a parameter hl.xml which I could use to request modified XML like this: hl.xlm=em hl.xlm=b This would allow smoother processing technologies like XSLT. Is such a feature available? Michael