Can you extract names, locations etc using OpenNLP in plain/straight java 
program?

If yes, here are two seperate options : 

1) Use http://searchhub.org/2012/02/14/indexing-with-solrj/ as an example to 
integrate your NER code into it and write your own indexing code. You have the 
full power here. No solr-plugins are involved.

2) Use 'Implementing a conditional copyField' given here : 
http://wiki.apache.org/solr/UpdateRequestProcessor
as an example and integrate your NER code into it. 


Please note that these are separate ways to enrich your incoming documents, 
choose either (1) or (2).



On Tuesday, June 3, 2014 3:30 PM, Vivekanand Ittigi <vi...@biginfolabs.com> 
wrote:
Okay, but i dint understand what you said. Can you please elaborate.

Thanks,
Vivek





On Tue, Jun 3, 2014 at 5:36 PM, Ahmet Arslan <iori...@yahoo.com> wrote:

> Hi Vivekanand,
>
> I have never use UIMA+Solr before.
>
> Personally I think it takes more time to learn how to configure/use these
> uima stuff.
>
>
> If you are familiar with java, write a class that extends
> UpdateRequestProcessor(Factory). Use OpenNLP for NER, add these new fields
> (organisation, city, person name, etc, to your document. This phase is
> usually called 'enrichment'.
>
> Does that makes sense?
>
>
>
> On Tuesday, June 3, 2014 2:57 PM, Vivekanand Ittigi <vi...@biginfolabs.com>
> wrote:
> Hi Ahmet,
>
> I followed what you said
> https://cwiki.apache.org/confluence/display/solr/UIMA+Integration. But how
> can i achieve my goal? i mean extracting only name of the organization or
> person from the content field.
>
> I guess i'm almost there but something is missing? please guide me
>
> Thanks,
> Vivek
>
>
>
>
>
> On Tue, Jun 3, 2014 at 2:50 PM, Vivekanand Ittigi <vi...@biginfolabs.com>
> wrote:
>
> > Entire goal cant be said but one of those tasks can be like this.. we
> have
> > big document(can be website or pdf etc) indexed to the solr.
> > Lets say <field name=content> will sore store the contents of document.
> > All i want to do is pick name of persons,places from it using openNLP or
> > some other means.
> >
> > Those names should be reflected in solr itself.
> >
> > Thanks,
> > Vivek
> >
> >
> > On Tue, Jun 3, 2014 at 1:33 PM, Ahmet Arslan <iori...@yahoo.com> wrote:
> >
> >> Hi,
> >>
> >> Please tell us what you are trying to in a new treat. Your high level
> >> goal. There may be some other ways/tools such as (
> >> https://stanbol.apache.org ) other than OpenNLP.
> >>
> >>
> >>
> >> On Tuesday, June 3, 2014 8:31 AM, Vivekanand Ittigi <
> >> vi...@biginfolabs.com> wrote:
> >>
> >>
> >>
> >> We'll surely look into UIMA integration.
> >>
> >> But before moving, is this( https://wiki.apache.org/solr/OpenNLP ) the
> >> only link we've got to integrate?isn't there any other article or link
> >> which may help us to do fix this problem.
> >>
> >> Thanks,
> >> Vivek
> >>
> >>
> >>
> >>
> >> On Tue, Jun 3, 2014 at 2:50 AM, Ahmet Arslan <iori...@yahoo.com> wrote:
> >>
> >> Hi,
> >> >
> >> >I believe I answered it. Let me re-try,
> >> >
> >> >There is no committed code for OpenNLP. There is an open ticket with
> >> patches. They may not work with current trunk.
> >> >
> >> >Confluence is the official documentation. Wiki is maintained by
> >> community. Meaning wiki can talk about some uncommitted features/stuff.
> >> Like this one : https://wiki.apache.org/solr/OpenNLP
> >> >
> >> >What I am suggesting is, have a look at
> >> https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
> >> >
> >> >
> >> >And search how to use OpenNLP inside UIMA. May be LUCENE-2899 is
> already
> >> doable with solr-uima. I am adding Tommaso (sorry for this but we need
> an
> >> authoritative answer here) to clarify this.
> >> >
> >> >
> >> >Also consider indexing with SolrJ and use OpenNLP enrichment outside
> the
> >> solr. Use openNLP with plain java, enrich your documents and index them
> >> with SolJ. You don't have to too everything inside solr as solr-plugins.
> >> >
> >> >Hope this helps,
> >> >
> >> >Ahmet
> >> >
> >> >
> >> >
> >> >On Monday, June 2, 2014 11:15 PM, Vivekanand Ittigi <
> >> vi...@biginfolabs.com> wrote:
> >> >Thanks, I will check with the jira.. but you dint answe my first
> >> >question..? And there's no way to integrate solr with openNLP?or is
> there
> >> >any committed code, using which i can go head.
> >> >
> >> >Thanks,
> >> >Vivek
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >On Mon, Jun 2, 2014 at 10:30 PM, Ahmet Arslan <iori...@yahoo.com>
> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> Here is the jira issue :
> >> https://issues.apache.org/jira/browse/LUCENE-2899
> >> >>
> >> >>
> >> >> Anyone can create an account.
> >> >>
> >> >> I didn't use UIMA by myself and I have little knowledge about it.
> But I
> >> >> believe it is possible to use OpenNLP inside UIMA.
> >> >> You need to dig into UIMA documentation.
> >> >>
> >> >> Solr UIMA integration already exists, thats why I questioned whether
> >> your
> >> >> requirement is possible with uima or not. I don't know the answer
> >> myself.
> >> >>
> >> >> Ahmet
> >> >>
> >> >>
> >> >>
> >> >> On Monday, June 2, 2014 7:42 PM, Vivekanand Ittigi <
> >> vi...@biginfolabs.com>
> >> >> wrote:
> >> >> Hi Arslan,
> >> >>
> >> >> If not uncommitted code, then which code to be used to integrate?
> >> >>
> >> >> If i have to comment my problems, which jira and how to put it?
> >> >>
> >> >> And why you are suggesting UIMA integration. My requirements is
> >> integrating
> >> >> with openNLP.? You mean we can do all the acitivties through UIMA as
> >> we do
> >> >> it using openNLP..?like name,location finder etc?
> >> >>
> >> >> Thanks,
> >> >> Vivek
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Mon, Jun 2, 2014 at 8:40 PM, Ahmet Arslan
> <iori...@yahoo.com.invalid
> >> >
> >> >> wrote:
> >> >>
> >> >> > Hi,
> >> >> >
> >> >> > Uncommitted code could have these kind of problems. It is not
> >> guaranteed
> >> >> > to work with latest trunk.
> >> >> >
> >> >> > You could commend the problem you face on the jira ticket.
> >> >> >
> >> >> > By the way, may be you are after something doable with already
> >> committed
> >> >> > UIMA stuff?
> >> >> >
> >> >> > https://cwiki.apache.org/confluence/display/solr/UIMA+Integration
> >> >> >
> >> >> > Ahmet
> >> >> >
> >> >> >
> >> >> >
> >> >> > On Monday, June 2, 2014 5:07 PM, Vivekanand Ittigi <
> >> >> vi...@biginfolabs.com>
> >> >> > wrote:
> >> >> > I followed this link to integrate
> >> https://wiki.apache.org/solr/OpenNLP
> >> >> to
> >> >> > integrate
> >> >> >
> >> >> > Installation
> >> >> >
> >> >> > For English language testing: Until LUCENE-2899 is committed:
> >> >> >
> >> >> >     1.pull the latest trunk or 4.0 branch
> >> >> >
> >> >> >     2.apply the latest LUCENE-2899 patch
> >> >> >     3.do 'ant compile'
> >> >> >     cd solr/contrib/opennlp/src/test-files/training
> >> >> >     .
> >> >> >     .
> >> >> >     .
> >> >> > i followed first two steps but got the following error while
> >> executing
> >> >> 3rd
> >> >> > point
> >> >> >
> >> >> > common.compile-core:
> >> >> >     [javac] Compiling 10 source files to
> >> >> >
> >> >> >
> >> >>
> >>
> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/build/analysis/opennlp/classes/java
> >> >> >
> >> >> >     [javac] warning: [path] bad path element
> >> >> >
> >> >> >
> >> >>
> >>
> "/home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/lib/jwnl-1.3.3.jar":
> >> >> > no such file or directory
> >> >> >
> >> >> >     [javac]
> >> >> >
> >> >> >
> >> >>
> >>
> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/FilterPayloadsFilter.java:43:
> >> >> > error: cannot find symbol
> >> >> >
> >> >> >     [javac]     super(Version.LUCENE_44, input);
> >> >> >
> >> >> >     [javac]                  ^
> >> >> >     [javac]   symbol:   variable LUCENE_44
> >> >> >     [javac]   location: class Version
> >> >> >     [javac]
> >> >> >
> >> >> >
> >> >>
> >>
> /home/biginfolabs/solrtest/solr-lucene-trunk3/lucene/analysis/opennlp/src/java/org/apache/lucene/analysis/opennlp/OpenNLPTokenizer.java:56:
> >> >> > error: no suitable constructor found for Tokenizer(Reader)
> >> >> >     [javac]     super(input);
> >> >> >     [javac]     ^
> >> >> >     [javac]     constructor Tokenizer.Tokenizer(AttributeFactory)
> is
> >> not
> >> >> > applicable
> >> >> >     [javac]       (actual argument Reader cannot be converted to
> >> >> > AttributeFactory by method invocation conversion)
> >> >> >     [javac]     constructor Tokenizer.Tokenizer() is not applicable
> >> >> >     [javac]       (actual and formal argument lists differ in
> length)
> >> >> >     [javac] 2 errors
> >> >> >     [javac] 1 warning
> >> >> >
> >> >> > Im really stuck how to passthough this step. I wasted my entire to
> >> fix
> >> >> this
> >> >> > but couldn't move a bit. Please someone help me..?
> >> >> >
> >> >> > Thanks,
> >> >> > Vivek
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >
> >>
> >
> >
>
>

Reply via email to