Hi,
if you have 30 classes with 10 samples per class, I'd say that's not an
optimal distribution.
Apart from that, you may use one of the text classifiers from
lucene-classification [1], is anything like this what you had in mind?
Alternatively you can also do things outside of Lucene and use Luce
would it make sense to create a separate Lucene module for ANN search ?
we could then experiment with the different approaches and compare them
across the same benchmarks.
On Thu, 16 Jul 2020 at 23:14, Ali Akhtar wrote:
> I’m a bit of a layman in this area, but if we are talking about formats fo
hi Alex,
I had worked on a similar problem directly on Lucene (within Anserini
toolkit) using LSH fingerprints of tokenized feature vector values.
You can find code at [1] and some information on the Anserini documentation
page [2] and in a short preprint [3].
As a side note my current thinking is
PMC vote: option C (current)
On Wed, 17 Jun 2020 at 07:58, Ignacio Vera Sequeiros
wrote:
> PMC vote: option A
>
> On Wed, Jun 17, 2020 at 7:36 AM Jeroen Lauwers
> wrote:
>
> > A. Definitely.
> >
> > Verstuurd vanaf mijn telefoon
> >
> > > Op 17 jun. 2020 om 03:46 heeft Jason Gerlowski
> > het
+1, some time ago I also used the decompounder mentioned by Dawid and was
satisfied back then.
Regards,
Tommaso
Il giorno sab 16 set 2017 alle ore 09:29 Dawid Weiss
ha scritto:
> Hi Mike. Search lucene dev archives. I did write a decompounder with Daniel
> Naber. The quality was not ideal but
I think it'd be interesting to also investigate using TypeAttribute [1]
together with TypeTokenFilter [2].
Regards,
Tommaso
[1] :
https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/analysis/tokenattributes/TypeAttribute.html
[2] :
https://lucene.apache.org/core/6_5_0/analyzers-common/org
improved locality of "near" documents could be used to avoid loading some
segments during the retrieval phase for certain use cases (e.g. spatial
search).
Il giorno mer 16 nov 2016 alle ore 09:45 Ishan Chattopadhyaya <
ichattopadhy...@gmail.com> ha scritto:
http://shaierera.blogspot.com/2013/04/
I think it might be helpful to handle POS tags as TypeAttributes so that
the input and output texts would cleaner and you can still filter and
retrieve tokens by type (e.g. with TypeTokenFilter).
My 2 cents,
Tommaso
Il giorno mer 19 ott 2016 alle ore 11:56 Niki Pavlopoulou
ha scritto:
> Hi Ste
can
> follow
> > up :)
>
> Let's see simple one first. :-) Why don't we consider adding Analyzer
> parameter
> to assignClass()?
>
> koji
>
>
> (14/03/07 17:18), Tommaso Teofili wrote:
>
>> cool Koji, thanks a lot for sharing.
>> Some
cool Koji, thanks a lot for sharing.
Some useful points / suggestions come out of it, let's see if we can follow
up :)
Regards,
Tommaso
2014-03-07 3:30 GMT+01:00 Koji Sekiguchi :
> Hello,
>
> I just posted an article on Comparing Document Classification Functions
> of Lucene and Mahout.
>
>
> h
2013/5/29 Koji Sekiguchi
> Hi Rajesh,
>
> Thanks!
> I'm planning to open an NLP tool kit for Lucene, and the tool kit will
> include
> the following synonym library.
>
sounds nice, looking forward to it.
Tommaso
>
> koji
>
>
> (13/05/28 14:12), Rajesh Nikam wrote:
>
>> Hello Koji,
>>
>> This
2013/1/15 VIGNESH S
> Hi All,
>
> Thanks for your replies..
>
> Actually I am trying to classify the email mail data in to categories
> and also spam mails .. I have tried clustering but it is not useful
> since we can not control categories.
>
> I am looking for a light weight implementation whi
Hi,
you can have a look at the (early stage) Lucene classification module on
trunk [1], see also a brief introduction given at last ApacheCon EU [2].
Hope this helps,
Tommaso
[1] :
http://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/classification/
[2] :
http://www.slideshare.net/teofili/tex
that's nice!
Tommaso
2012/11/19 Uwe Schindler
> Lol!
>
> Many thanks for this support!
>
> Uwes
>
>
>
> Otis Gospodnetic schrieb:
>
> >Hi,
> >
> >Quick announcement for Uwe & Friends.
> >
> >UweSays is now a super-duper-special query operator over on
> >http://search-lucene.com/ . Now whenev
Ok, that saves you from concurrency issue, but in my experience is just
much slower than local file system, so still NFS can be used but with some
tradeoff on performance.
My 2 cents,
Tommaso
2012/10/2 Jong Kim
> The setup is I have a home-grown server process that has exclusive access
> to the
2012/2/6 Ian Lea
> Not sure if you got an answer to this or not. Don't recall seeing one
> and gmail threading says not.
>
> > Is the use of payloads I've described appropriate?
>
> Sounds OK to me, although I'm not sure why you can't store the
> metadata as a Document Field.
>
> > Can I exclude
[X] ASF Mirrors (linked in our release announcements or via the Lucene
website)
[X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
[] Other (someone in your company mirrors them internally or via a
downstream project)
2011
17 matches
Mail list logo