Hi everyone, I need to install a plugin to extract Location (Country/State/City) from free text documents - any professional advice?!? Does OpenNLP really does the job? Is it English only? US only? Or does it cover worldwide places names? Could someone help me with this job - installation, configuration, model-training etc?
Please help,Kind regards,Christian Christian Fotache Tel: 0728.297.207 Fax: 0351.411.570 From: Upayavira <u...@odoko.co.uk> To: solr-user@lucene.apache.org Sent: Tuesday, November 3, 2015 12:13 PM Subject: Re: language plugin Looking at the code, this is not going to work without modifications to Solr (or at least a custom component). The atomic update code is closely embedded into the Solr DistributedUpdateProcessor, which expands the atomic update into a full document and then posts it to the shards. You need to do the update expansion before your lang detect processor, but there is no gap between them. >From my reading of the code, you could create an AtomicUpdateProcessor that simply expands updates, and insert that before the LangDetectUpdateProcessor. Upayavira On Tue, Nov 3, 2015, at 06:38 AM, Chaushu, Shani wrote: > Hi > When I make atomic update - set field - also on content field and also > another field, the language field became generic. Meaning, it doesn’t > work in the set field, only in the first inserting. Even if in the first > time the language was detected, it just became generic after the update. > Any idea? > > The chain is > > <updateRequestProcessorChain name="aa_chain"> > <processor > class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory"> > > <str name="langid.fl">title,content,text</str> > <str name="langid.langField">language_t</str> > <str name="langid.langsField">language_all_t</str> > <str name="langid.fallback">generic</str> > <str name="langid.overwrite">false</str> > <str name="langid.threshold">0.8</str> > </processor> > <processor class="solr.LogUpdateProcessorFactory" /> > <processor class="solr.RunUpdateProcessorFactory" /> > </updateRequestProcessorChain> > > > Thanks, > Shani > > > > > -----Original Message----- > From: Jack Krupansky [mailto:jack.krupan...@gmail.com] > Sent: Thursday, October 29, 2015 17:04 > To: solr-user@lucene.apache.org > Subject: Re: language plugin > > Are you trying to do an atomic update without the content field? If so, > it sounds like Solr needs an enhancement (bug fix?) so that language > detection would be skipped if the input field is not present. Or maybe > that could be an option. > > > -- Jack Krupansky > > On Thu, Oct 29, 2015 at 3:25 AM, Chaushu, Shani <shani.chau...@intel.com> > wrote: > > > Hi, > > I'm using solr language detection plugin on field name "content" > > (solr 4.10, plugin LangDetectLanguageIdentifierUpdateProcessorFactory) > > When I'm indexing on the first time it works fine, but if I want to > > set one field again (regardless if it's the content or not) if goes to > > its default language. If I'm setting other field I would like the > > language to stay the way it was before, and o don't want to insert all > > the content again. There is an option to set the plugin that it won't > > calculate again the language? (put langid.overwrite to false didn't > > work) > > > > Thanks, > > Shani > > > > > > --------------------------------------------------------------------- > > Intel Electronics Ltd. > > > > This e-mail and any attachments may contain confidential material for > > the sole use of the intended recipient(s). Any review or distribution > > by others is strictly prohibited. If you are not the intended > > recipient, please contact the sender and delete all copies. > > > --------------------------------------------------------------------- > Intel Electronics Ltd. > > This e-mail and any attachments may contain confidential material for > the sole use of the intended recipient(s). Any review or distribution > by others is strictly prohibited. If you are not the intended > recipient, please contact the sender and delete all copies.