Re: Problem in running DUCC Job for Arabic Language

2018-11-06 Thread Jaroslaw Cwiklik
Forgot to mention that if you have a shared file system the best practice is not to serialize your content (SOFA) from JD to service. Instead, in a CR add a path to the file containing Subject of Analysis to the CAS and have the CM in the pipeline read the content from the shared file system. -jerr

Re: Problem in running DUCC Job for Arabic Language

2018-11-06 Thread Jaroslaw Cwiklik
Can you try setting -Dfile.encoding=ISO-8859-1 for the service (job) process and -Djavax.servlet.request.encoding=ISO-8859-1 -Dfile.encoding=ISO-8859-1 for the JD process. The JD actually uses Jetty webserver to serve service requests over HTTP. I went as far as extracting Jetty server code from J

Re: Problem in running DUCC Job for Arabic Language

2018-07-06 Thread rohit14csu173
Yes if i run the AE as a DUCC UIMA-AS Service and send it CASes from UIMA-AS client it works fine. Infact the enviornment i.e the LANG argument is same for UIMA-AS Service and DUCC JOB. Environ[3] = LANG=en_IN And if i change the LANG=ar then while getting the data coming in JD the arabic text

Re: Problem in running DUCC Job for Arabic Language

2018-07-05 Thread Eddie Epstein
So if you run the AE as a DUCC UIMA-AS service and send it CASes from some UIMA-AS client it works OK? The full environment for all processes that DUCC launches are available via ducc-mon under the Specification or Registry tab for that job or managed reservation or service. Please see if the LANG

Re: Problem in running DUCC Job for Arabic Language

2018-07-05 Thread rohit14csu173
Hey, Yeah you got it right the first snippet comes in CR before the data goes in CAS. And the second snippet is in the first annotator or analysis engine(AE) of my Aggregate Desciptor. I am pretty sure this is an issue of the CAS used by DUCC because if i use service of DUCC in which we are sup

Re: Problem in running DUCC Job for Arabic Language

2018-07-03 Thread Eddie Epstein
Rohit, Before sending the data into jcas if i force encode it :- > > String content2 = null; > content2 = new String(content.getBytes("UTF-8"), "ISO-8859-1"); > jcas.setDocumentText(content2); > Where is this code, in the job CR? > > And when i go in my first annotator i force decode it:- > >

Re: Problem in running DUCC Job for Arabic Language

2018-07-02 Thread rohit14csu173
Hey Eddie, Before sending the data into jcas if i force encode it :- String content2 = null; content2 = new String(content.getBytes("UTF-8"), "ISO-8859-1"); jcas.setDocumentText(content2); And when i go in my first annotator i force decode it:- String content = null; content = new String(jcas.g

Re: Problem in running DUCC Job for Arabic Language

2018-06-18 Thread Eddie Epstein
Hi Rohit, In a DUCC job the CAS created by users CR in the Job Driver is serialized into cas.xmi format, transported to the Job Process where it is deserialized and given to the users analytics. Likely the problem is in CAS serialization or deserialization, perhaps due to the active LANG environme

Problem in running DUCC Job for Arabic Language

2018-06-13 Thread Rohit yadav
Hey, I use DUCC for english language and it works without any problem. But lately i tried deploying a job for Arabic Language and all the content of Arabic Text is replaced by *'?'* (Question Mark). I am extracting Data from Accumlo and after processing i send it to ES6. When i checked the lo