Re: Blåbærsyltetøy v.s. Räksmörgås

2013-05-22 Thread Petite Abeille
On May 22, 2013, at 7:08 PM, Karl Wettin wrote: >> * Use a filter after ASCIIFoldingFilter that discriminate all use of ae, oe, >> oo, and other combination of double vowels, just keeping the first one. > > I ended up with that solution. > > https://issues.apache.org/jira/browse/LUCENE-5013

[OT] Text Summarizer?

2013-02-17 Thread Petite Abeille
Hello, A bit off topic, but… could someone recommend a text summarizer? Something along the lines of Open Text Summarizer or such: http://libots.sourceforge.net What's the state of the art in text summarizer at the moment? Thanks in advance. Cheers, PA. --

Re: Solr/Lucene + Oracle Database seamless integration

2012-10-23 Thread Petite Abeille
On Oct 23, 2012, at 10:35 PM, Maximiliano Keen wrote: > Scotas combines and synchronize the high-performance, full-featured > Solr/Lucene text search engine with the industry leading Oracle Database's > performance, scalability, security, and reliability. How does this compares/contrasts to O

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-08 Thread Petite Abeille
On Feb 8, 2012, at 10:14 AM, Danil ŢORIN wrote: > For example if you only query data for 1 month intervals, and you > partition by date, you can calculate in which shard your data can be > found, and query just that shard. This is what one calls "partition pruning" in database terms. http://en.

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-08 Thread Petite Abeille
On Feb 8, 2012, at 10:14 AM, Danil ŢORIN wrote: > For example if you only query data for 1 month intervals, and you > partition by date, you can calculate in which shard your data can be > found, and query just that shard. This is what one calls "partition pruning" in database terms. http://en.

Re: Strategy for large index files

2012-01-08 Thread Petite Abeille
On Jan 8, 2012, at 6:32 AM, Cheng wrote: > Hi, my servlet application is running a large index of 20G. I don't think > it can be loaded to RAM at one time. > > What are the general strategies to improve the search and write performance? Got money? http://www.ramsan.com/ http://www.fusionio.co

Re: ElasticSearch

2011-11-17 Thread Petite Abeille
On Nov 17, 2011, at 9:03 PM, Yonik Seeley wrote: >> dude, look at this query... its insane isn't it :) > > Sorry... what's the equivalent you'd like instead? > Or if you're just unjustifiably bitching about Solr again, maybe I > should take a stroll through Lucene land and bitch about > incompre

Re: Bet you didn't know Lucene can...

2011-10-31 Thread Petite Abeille
On Oct 31, 2011, at 9:32 PM, Andrzej Bialecki wrote: > similarity-preserving hash function was calculated on each sentence, and the > hash was added as a field. The property of the hash was that similar > documents (sentences) would produce a similar hash, with only some bit-level > perturbati

Re: Language Identifier with Lucene?

2011-10-22 Thread Petite Abeille
On Oct 22, 2011, at 2:49 AM, Luca Rondanini wrote: > I usually use Nutch for this but, just for fun, I tried to create a language > identifier based on Lucene only. Talking of which: Google's Compact Language Detector http://blog.mikemccandless.com/2011/10/language-detection-with-googles-compac

Re: Wikileaks Iraq log

2010-12-01 Thread Petite Abeille
On Dec 1, 2010, at 7:29 AM, Seid Muhie wrote: > anybody who can give me a hint please http://lmgtfy.com/?q=WikiLeaks+War+Diary%3A+Iraq+War+Logs - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional co

Re: NAS vs SAN vs Server Disk RAID

2010-02-26 Thread Petite Abeille
On Feb 25, 2010, at 12:54 AM, Andrew Bruno wrote: > Since the disk IO on the server is high, our datacenter engineers suggested > we look at NAS or SAN, for performance gain, and for future growth. Alternatively, get a stack of RamSan and call it a day: http://www.ramsan.com/products/products.h

Re: Storing a Lucene Index on a SAN Storage: good idea?

2009-09-27 Thread Petite Abeille
On Sep 27, 2009, at 7:07 PM, J.J. Larrea wrote: While there was indeed a measurable performance hit, which as one might expect took the form of a broader distribution of request latencies, the mean time was if I recall only 15%-20% worse for the virtualized/SAN configuration. This was adj

Re: Storing a Lucene Index on a SAN Storage: good idea?

2009-09-27 Thread Petite Abeille
On Sep 26, 2009, at 9:36 AM, Matthias Hess wrote: Does anybody have good or bad experiences with SAN disks? Yea, make sure that your "high end" SAN doesn't end up storing your files on one physical disk :P In short, a SAN can be fiendishly difficult to setup properly, specially if it ha

Re: phonetic encoders for other languages?

2009-08-23 Thread Petite Abeille
On Aug 23, 2009, at 2:35 PM, Paul Libbrecht wrote: I will need to use phonetic analyzers to do "phonetic search". I know of the Metaphone analyzers and use them but they're really only known to work for English. Double Metaphone? http://en.wikipedia.org/wiki/Double_Metaphone -- PA. http:

Re: "People you might know" ( a la Facebook) - *slightly offtopic*

2009-03-17 Thread Petite Abeille
On Mar 17, 2009, at 2:32 PM, Aaron Schon wrote: how would I go about recommending Jane Doe connecting to Frank Jones?. Hope you can help a newbie by pointing where I should be looking? You might as well read something about it to get you started: "Programming Collective Intelligence" http

Re: Lucene vs. Database

2008-10-02 Thread Petite Abeille
On Oct 2, 2008, at 9:41 AM, agatone wrote: Now I have to go detailed into every one of them and write down stuff. Couple of handy guidelines: http://www.w3.org/DesignIssues/Principles.html E.g. "Principle of Least Power" :) Cheers, -- PA. http://alt.textdrive.com/nanoki/

Re: Lucene vs. Database

2008-10-01 Thread Petite Abeille
On Oct 1, 2008, at 9:43 AM, agatone wrote: I'm working on a project that has big database in the background (some tables have about 150 rows). We decided to use Lucene for "faster" search. Our search works similar as all searches: you write search string, get list of hits with detail link

Re: Haloe (Lucene package) released!

2008-09-08 Thread Petite Abeille
On Sep 8, 2008, at 7:49 PM, Marcus Herou wrote: :) Whoof so much high quality info and at the same time a huge amount of useless data, splogs and spam. Incidentally, if you search needs are humbler and do not require the full fire power of mighty Lucene, SQLite provides a very handy Full

Re: Haloe (Lucene package) released!

2008-09-08 Thread Petite Abeille
On Sep 8, 2008, at 6:43 AM, Marcus Herou wrote: the ShardedSolrDocumentIndexer will be used frequently now when we will index the entire Blogosphere. Yes, you will indeed need all the help you can muster! :) blogosphere, noun An poisonous environment of methane, self-satisfaction and other h

Re: Accent Insensitive Search

2008-07-16 Thread Petite Abeille
On Jul 16, 2008, at 10:58 AM, [EMAIL PROTECTED] wrote: Simple example is Kraków search should also bring Krakow also in search results. As pointed out previously, you need to transliterate your input using something like ISOLatinFilter or such. For example, searching for 'aaiun' should r

Re: SOC: Lulu, a Lua implementation of Lucene

2008-03-01 Thread Petite Abeille
Hi Marvin, On Mar 1, 2008, at 2:33 AM, Marvin Humphrey wrote: How fast is Lua's method dispatch, compared to Java's? Fast enough. http://luajit.org/luajit_performance.html That has a huge impact on performance, since *everything* is a method in Lucene -- down to writeByte(). The plan i

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Petite Abeille
On Feb 29, 2008, at 11:39 AM, Mathieu Lecarme wrote: For me, Lua is just a glue between C coded object, a super config file. Like used in lighttpd or WoW. Here is a an online demo of a wiki engine implemented purely in Lua: http://svr225.stepx.com:3388/a http://svr225.stepx.com:3388/nanoki

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Petite Abeille
On Feb 29, 2008, at 8:37 PM, Simon Willnauer wrote: or go to http://lucene.apache.org/lucy/ Looks rather, hmmm, inactive: http://svn.apache.org/viewvc/lucene/lucy/ Is there any working code anywhere? - To unsubscribe, e-m

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Petite Abeille
On Feb 29, 2008, at 3:42 PM, Mathieu Lecarme wrote: In other hands, a Lucy with C for persistance and parsing, and Lua for filter and other fine configuration can be great. Who is that Lucy you keep talking about? :P Cheers, PA. ---

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Petite Abeille
On Feb 29, 2008, at 1:09 PM, Grant Ingersoll wrote: That implies the Lucy actually is under development... Perhaps they will take up work on Lucy... Lulu has nothing to do with Lucy... goes back to something at high school or something... ---

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Petite Abeille
On Feb 29, 2008, at 11:39 AM, Mathieu Lecarme wrote: For me, Lua is just a glue between C coded object, a super config file. Like used in lighttpd or WoW. This is a rather very narrow view of what one could do with Lua. For example, this wiki engine is written exclusively in Lua: http://al

SOC: Lulu, a Lua implementation of Lucene

2008-02-28 Thread Petite Abeille
A proposal for a Lua entry for the "Google Summer of Code" '08: lu·lu (lū'lū) n. Slang. A remarkable person, object, or idea. A very attractive or seductive looking woman. A Lua implementation of Lucene. Skimpy details bellow: http://svr225.stepx.com:3388/lulu http://lua-users.org/wiki/Goog

[OT][ANN] Nanoki

2008-02-13 Thread Petite Abeille
[Not even remotely related to Lucene, Java, Apache or anything] Nanoki, a sweet little wiki engine implemented in Lua [1]. http://alt.textdrive.com/nanoki/ Online demo: http://svr225.stepx.com:3388/nanoki Kind regards, PA. [1] http://www.lua.org/about.html -

Re: Indexing accented characters, then searching by any form

2008-02-11 Thread Petite Abeille
On Feb 11, 2008, at 4:00 PM, Cesar Ronchese wrote: For example: Indexed word: usuário Terms typed by the user, to find the word above: usuário or usuario or usuãrio, etc. If you feel ambitious, you can try something along the lines of Sean M. Burke's Unidecode!: http://interglacial.com/~s

Re: Indexing Wikipedia dumps

2007-12-13 Thread Petite Abeille
On Dec 13, 2007, at 8:39 AM, Dawid Weiss wrote: Just incidentally -- do you know of something that would parse the wikipedia markup (to plain text, for example)? If you find out, let us know :) You may want to check the partial ANTLR grammar for Wikitext: http://www.mediawiki.org/wiki/User