Re: Language Detection for the data

2018-12-12 Thread Karl Wright
Hi Nikita, This is occurring because en_GB does not have a translations file. It's a warning and the code falls back to using en_US. Karl On Wed, Dec 12, 2018 at 4:39 AM Nikita Ahuja wrote: > Hi Karl, > > Thanks for the suggestion and Language for the data and content is able to > detect

Re: Language Detection for the data

2018-12-12 Thread Nikita Ahuja
Hi Karl, Thanks for the suggestion and Language for the data and content is able to detect now. But there is one issue while ingesting the records in the ElasticSearch Index. and it is stored there in the log file as: ERROR 2018-12-11T19:19:37,637 (qtp348148678-561) - Missing resource bundle

Re: Language Detection for the data

2018-11-21 Thread Nikita Ahuja
HI All, Thanks for the timely replies. But I am basically concerned for the language detection of the .doc,.pdf or any other data present in the repository. As per my understanding Tika Transformation provides functionality for the same. But there is no output for the language of the documents.

Re: Language Detection for the data

2018-11-21 Thread Furkan KAMACI
Hi Nikita, First of all, OpenNLP is a transformation connector at ManifoldCF and should be enabled by default. It extracts named entities (people, locations and organizations) from document. You should download trained models to run OpenNLP connector. You can check here for such purpose:

Re: Language Detection for the data

2018-11-21 Thread Karl Wright
Hi Nikita, Can you be more specific when you say "OpenNLP is not working"? All that this connector does is integrate OpenNLP as a ManifoldCF transformer. It uses a specific directory to deliver the models that OpenNLP uses to match and extract content from documents. Thus, you can provide any

Language Detection for the data

2018-11-20 Thread Nikita Ahuja
Hi, I have query related to detect the language of the records/data which is going to be ingest in the Output Connector. OpenNLP connector is not working for the detection as per the user documentation, but this is not working appropriately. Please suggest is NLP has to be used if yes, then how