> Do you use the POS tagger at query time, or just at index time? 

I have the POS tagger pipeline ready but nothing done yet on the solr
part. Right now I am wondering how to use it but still looking for
relevant implementation.

I guess having the POS information ready before indexation gives the
flexibility to test multiple scenario.

In case of acronyms, one possible way is indeed to consider the user
query as NOUNS, and from the index side, only keep the acronyms that
are tagged with NOUNS. (i.e. detect acronyms within text, and look for
it's POS; remove it in case it's not a NOUN)

Definitely, I prefer the pre-processing approach for this, than creating
dedicated solr analysers because my context is batch processing, and
also this simplifies testing and debugging - while offering large panel
of NLP tools to deal with.

On Fri, Oct 25, 2019 at 04:09:29PM +0000, Audrey Lorberfeld - 
audrey.lorberf...@ibm.com wrote:
> Nicolas,
> 
> Do you use the POS tagger at query time, or just at index time? 
> 
> We are thinking of using it to filter the tokens we will eventually perform 
> ML on. Basically, we have a bunch of acronyms in our corpus. However, many 
> departments use the same acronyms but expand those acronyms to different 
> things. Eventually, we are thinking of using ML on our index to determine 
> which expansion is meant by a particular query according to the context we 
> find in certain documents. However, since we don't want to run ML on all 
> tokens in a query, and since we think that acronyms are usually the nouns in 
> a multi-token query, we want to only feed nouns to the ML model (TBD).
> 
> Does that make sense? So, we'd want both an index-side POS tagger (could be 
> slow), and also a query-side POS tagger (must be fast).
> 
> -- 
> Audrey Lorberfeld
> Data Scientist, w3 Search
> IBM
> audrey.lorberf...@ibm.com
>  
> 
> On 10/25/19, 11:57 AM, "Nicolas Paris" <nicolas.pa...@riseup.net> wrote:
> 
>     Also we are using stanford POS tagger for french. The processing time is
>     mitigated by the spark-corenlp package which distribute the process over
>     multiple node.
>     
>     Also I am interesting in the way you use POS information within solr
>     queries, or solr fields. 
>     
>     Thanks,
>     On Fri, Oct 25, 2019 at 10:42:43AM -0400, David Hastings wrote:
>     > ah, yeah its not the fastest but it proved to be the best for my 
> purposes,
>     > I use it to pre-process data before indexing, to apply more metadata to 
> the
>     > documents in a separate field(s)
>     > 
>     > On Fri, Oct 25, 2019 at 10:40 AM Audrey Lorberfeld -
>     > audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote:
>     > 
>     > > No, I meant for part-of-speech tagging __ But that's interesting that 
> you
>     > > use StanfordNLP. I've read that it's very slow, so we are concerned 
> that it
>     > > might not work for us at query-time. Do you use it at query-time, or 
> just
>     > > index-time?
>     > >
>     > > --
>     > > Audrey Lorberfeld
>     > > Data Scientist, w3 Search
>     > > IBM
>     > > audrey.lorberf...@ibm.com
>     > >
>     > >
>     > > On 10/25/19, 10:30 AM, "David Hastings" <hastings.recurs...@gmail.com>
>     > > wrote:
>     > >
>     > >     Do you mean for entity extraction?
>     > >     I make a LOT of use from the stanford nlp project, and get out the
>     > > entities
>     > >     and use them for different purposes in solr
>     > >     -Dave
>     > >
>     > >     On Fri, Oct 25, 2019 at 10:16 AM Audrey Lorberfeld -
>     > >     audrey.lorberf...@ibm.com <audrey.lorberf...@ibm.com> wrote:
>     > >
>     > >     > Hi All,
>     > >     >
>     > >     > Does anyone use a POS tagger with their Solr instance other than
>     > >     > OpenNLP’s? We are considering OpenNLP, SpaCy, and Watson.
>     > >     >
>     > >     > Thanks!
>     > >     >
>     > >     > --
>     > >     > Audrey Lorberfeld
>     > >     > Data Scientist, w3 Search
>     > >     > IBM
>     > >     > audrey.lorberf...@ibm.com
>     > >     >
>     > >     >
>     > >
>     > >
>     > >
>     
>     -- 
>     nicolas
>     
> 

-- 
nicolas

Reply via email to