from:"Tommaso Teofili"

Re: Sentence classification with Lucene

2025-02-19 Thread Tommaso Teofili

Hi, if you have 30 classes with 10 samples per class, I'd say that's not an optimal distribution. Apart from that, you may use one of the text classifiers from lucene-classification [1], is anything like this what you had in mind? Alternatively you can also do things outside of Lucene and use Luce

Re: ANN search current state

2020-07-17 Thread Tommaso Teofili

would it make sense to create a separate Lucene module for ANN search ? we could then experiment with the different approaches and compare them across the same benchmarks. On Thu, 16 Jul 2020 at 23:14, Ali Akhtar wrote: > I’m a bit of a layman in this area, but if we are talking about formats fo

Re: Optimizing a boolean query for 100s of term clauses

2020-06-25 Thread Tommaso Teofili

hi Alex, I had worked on a similar problem directly on Lucene (within Anserini toolkit) using LSH fingerprints of tokenized feature vector values. You can find code at [1] and some information on the Anserini documentation page [2] and in a short preprint [3]. As a side note my current thinking is

Re: [VOTE] Lucene logo contest

2020-06-17 Thread Tommaso Teofili

PMC vote: option C (current) On Wed, 17 Jun 2020 at 07:58, Ignacio Vera Sequeiros wrote: > PMC vote: option A > > On Wed, Jun 17, 2020 at 7:36 AM Jeroen Lauwers > wrote: > > > A. Definitely. > > > > Verstuurd vanaf mijn telefoon > > > > > Op 17 jun. 2020 om 03:46 heeft Jason Gerlowski > > het

Re: German decompounding/tokenization with Lucene?

2017-09-16 Thread Tommaso Teofili

+1, some time ago I also used the decompounder mentioned by Dawid and was satisfied back then. Regards, Tommaso Il giorno sab 16 set 2017 alle ore 09:29 Dawid Weiss ha scritto: > Hi Mike. Search lucene dev archives. I did write a decompounder with Daniel > Naber. The quality was not ideal but

Re: Using POS payloads for chunking

2017-06-14 Thread Tommaso Teofili

I think it'd be interesting to also investigate using TypeAttribute [1] together with TypeTokenFilter [2]. Regards, Tommaso [1] : https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/analysis/tokenattributes/TypeAttribute.html [2] : https://lucene.apache.org/core/6_5_0/analyzers-common/org

Re: Possible to cause documents to be contiguous after forceMerge?

2016-11-16 Thread Tommaso Teofili

improved locality of "near" documents could be used to avoid loading some segments during the retrieval phase for certain use cases (e.g. spatial search). Il giorno mer 16 nov 2016 alle ore 09:45 Ishan Chattopadhyaya < ichattopadhy...@gmail.com> ha scritto: http://shaierera.blogspot.com/2013/04/

Re: POS tagging in Lucene

2016-10-19 Thread Tommaso Teofili

I think it might be helpful to handle POS tags as TypeAttributes so that the input and output texts would cleaner and you can still filter and retrieve tokens by type (e.g. with TypeTokenFilter). My 2 cents, Tommaso Il giorno mer 19 ott 2016 alle ore 11:56 Niki Pavlopoulou ha scritto: > Hi Ste

Re: [blog post] Comparing Document Classification Functions of Lucene and Mahout

2014-03-08 Thread Tommaso Teofili

can > follow > > up :) > > Let's see simple one first. :-) Why don't we consider adding Analyzer > parameter > to assignClass()? > > koji > > > (14/03/07 17:18), Tommaso Teofili wrote: > >> cool Koji, thanks a lot for sharing. >> Some

Re: [blog post] Comparing Document Classification Functions of Lucene and Mahout

2014-03-07 Thread Tommaso Teofili

cool Koji, thanks a lot for sharing. Some useful points / suggestions come out of it, let's see if we can follow up :) Regards, Tommaso 2014-03-07 3:30 GMT+01:00 Koji Sekiguchi : > Hello, > > I just posted an article on Comparing Document Classification Functions > of Lucene and Mahout. > > > h

Re: [blog post] Automatically Acquiring Synonym Knowledge from Wikipedia

2013-05-28 Thread Tommaso Teofili

2013/5/29 Koji Sekiguchi > Hi Rajesh, > > Thanks! > I'm planning to open an NLP tool kit for Lucene, and the tool kit will > include > the following synonym library. > sounds nice, looking forward to it. Tommaso > > koji > > > (13/05/28 14:12), Rajesh Nikam wrote: > >> Hello Koji, >> >> This

Re: Reg Lucene Naive Bayesian classifier.

2013-01-15 Thread Tommaso Teofili

2013/1/15 VIGNESH S > Hi All, > > Thanks for your replies.. > > Actually I am trying to classify the email mail data in to categories > and also spam mails .. I have tried clustering but it is not useful > since we can not control categories. > > I am looking for a light weight implementation whi

Re: Help needed Regarding classification of Text Data using Lucene..

2013-01-09 Thread Tommaso Teofili

Hi, you can have a look at the (early stage) Lucene classification module on trunk [1], see also a brief introduction given at last ApacheCon EU [2]. Hope this helps, Tommaso [1] : http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/classification/ [2] : http://www.slideshare.net/teofili/tex

Re: ANN: UweSays Query Operator

2012-11-20 Thread Tommaso Teofili

that's nice! Tommaso 2012/11/19 Uwe Schindler > Lol! > > Many thanks for this support! > > Uwes > > > > Otis Gospodnetic schrieb: > > >Hi, > > > >Quick announcement for Uwe & Friends. > > > >UweSays is now a super-duper-special query operator over on > >http://search-lucene.com/ . Now whenev

Re: Lucene index on NFS

2012-10-02 Thread Tommaso Teofili

Ok, that saves you from concurrency issue, but in my experience is just much slower than local file system, so still NFS can be used but with some tradeoff on performance. My 2 cents, Tommaso 2012/10/2 Jong Kim > The setup is I have a home-grown server process that has exclusive access > to the

Re: Custom Payload Analyzer and Query

2012-02-07 Thread Tommaso Teofili

2012/2/6 Ian Lea > Not sure if you got an answer to this or not. Don't recall seeing one > and gmail threading says not. > > > Is the use of payloads I've described appropriate? > > Sounds OK to me, although I'm not sure why you can't store the > metadata as a Document Field. > > > Can I exclude

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Tommaso Teofili

[X] ASF Mirrors (linked in our release announcements or via the Lucene website) [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) 2011

Re: Sentence classification with Lucene

Re: ANN search current state

Re: Optimizing a boolean query for 100s of term clauses

Re: [VOTE] Lucene logo contest

Re: German decompounding/tokenization with Lucene?

Re: Using POS payloads for chunking

Re: Possible to cause documents to be contiguous after forceMerge?

Re: POS tagging in Lucene

Re: [blog post] Comparing Document Classification Functions of Lucene and Mahout

Re: [blog post] Comparing Document Classification Functions of Lucene and Mahout

Re: [blog post] Automatically Acquiring Synonym Knowledge from Wikipedia

Re: Reg Lucene Naive Bayesian classifier.

Re: Help needed Regarding classification of Text Data using Lucene..

Re: ANN: UweSays Query Operator

Re: Lucene index on NFS

Re: Custom Payload Analyzer and Query

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

17 matches

Site Navigation

Mail list logo

Footer information