Can you try setting -Dfile.encoding=ISO-8859-1 for the service (job) process and -Djavax.servlet.request.encoding=ISO-8859-1 -Dfile.encoding=ISO-8859-1 for the JD process.
The JD actually uses Jetty webserver to serve service requests over HTTP. I went as far as extracting Jetty server code from JD into a simple http server process and also extracted HttpClient related code from the service into a simple client process to be able to test. So on the server side I have: String text = new String("استعرض المتحدث باسم قوات «التحالف العربي لدعم".getBytes("UTF-8"),"ISO-8859-1"); response.setHeader("content-type", "text/xml"); String body = marshall(text); // XStream serialization response.getWriter().write(body); On the client side: System.out.println("Default Locale: " + Locale.getDefault()); System.out.println("Default Charset: " + Charset.defaultCharset()); System. out.println("file.encoding; " + System.getProperty("file.encoding")); HttpResponse response = httpClient.execute(postMethod); HttpEntity entity = response.getEntity(); String content = EntityUtils.toString(entity); String result = (String) unmarshall(content); //XStream unmarshall String o = new String(result.getBytes() ); System.out.println(o); When I run with the above -D settings the client console shows: Default Locale: en_US Default Charset: ISO-8859-1 file.encoding; ISO-8859-1 استعرض المتحدث باسم قوات «التحالف العربي لدعم Without the -D's I dont see arabic text and instead see garbage on the console. On Fri, Jul 6, 2018 at 3:00 AM rohit14csu...@ncuindia.edu < rohit14csu...@ncuindia.edu> wrote: > Yes if i run the AE as a DUCC UIMA-AS Service and send it CASes from > UIMA-AS client it works fine. > Infact the enviornment i.e the LANG argument is same for UIMA-AS Service > and DUCC JOB. > > Environ[3] = LANG=en_IN > > And if i change the LANG=ar then while getting the data coming in JD the > arabic text is already replaced with ???(Question Mark) and the encoding of > the data coming in JD or CR shows ASCII encoding. > I don't understand why is this happening. > > Best > Rohit > > > On 2018/07/05 13:35:11, Eddie Epstein <eaepst...@gmail.com> wrote: > > So if you run the AE as a DUCC UIMA-AS service and send it CASes from > some > > UIMA-AS client it works OK? The full environment for all processes that > > DUCC launches are available via ducc-mon under the Specification or > > Registry tab for that job or managed reservation or service. Please see > if > > the LANG setting for the service is different from the LANG setting for > the > > job. > > > > One can also see the LANG setting for a linux process-id by doing: > > > > cat /proc/<pid>/environ > > > > The LANG to be used for a DUCC process can be set by adding to the > > --environment argument "LANG=xxx" as needed > > > > Thanks, > > Eddie > > > > > > > > On Thu, Jul 5, 2018 at 6:47 AM, rohit14csu...@ncuindia.edu < > > rohit14csu...@ncuindia.edu> wrote: > > > > > Hey, > > > Yeah you got it right the first snippet comes in CR before the data > goes > > > in CAS. > > > And the second snippet is in the first annotator or analysis > engine(AE) of > > > my Aggregate Desciptor. > > > I am pretty sure this is an issue of the CAS used by DUCC because if i > use > > > service of DUCC in which we are supposed to send the CAS and receive > the > > > same CAS with added features from DUCC i get accurate results. > > > > > > But the only problem comes in submitting a job where the cas is > generated > > > by DUCC. > > > This can also be a issue of the enviornment(Language) of DUCC because > the > > > default language is english. > > > > > > Bets Regards > > > Rohit > > > > > > On 2018/07/03 13:11:50, Eddie Epstein <eaepst...@gmail.com> wrote: > > > > Rohit, > > > > > > > > Before sending the data into jcas if i force encode it :- > > > > > > > > > > String content2 = null; > > > > > content2 = new String(content.getBytes("UTF-8"), "ISO-8859-1"); > > > > > jcas.setDocumentText(content2); > > > > > > > > > > > > > Where is this code, in the job CR? > > > > > > > > > > > > > > > > > > > > > > And when i go in my first annotator i force decode it:- > > > > > > > > > > String content = null; > > > > > content = new String(jcas.getDocumentText.getBytes("ISO-8859-1"), > > > > > "UTF-8"); > > > > > > > > > > > > > And is this in the first annotator of the job process, i.e. the CM? > > > > > > > > Please be as specific as possible. > > > > > > > > Thanks, > > > > Eddie > > > > > > > > > >