Radu and Vinh, Thank you so much for putting your time into this!

Yes, I understand that the problem manifests because the "e" in Mexico is a 
double-byte char. But java is supposed to count the characters, not the bytes, 
and UTF-8 supports international characters.

Radu, since you can validate the document, it means that something else is 
different. 
1. What version of xmlbeans are you using? 
2. Should I be concerned about the xerces version? 
3. Could you please send me the exact code that you use to validate, so that 4. 
I can try to isolate my problem further? 
5. How do I run the 'validate' utility of xmlbeans?

I've been trying to solve this problem for a while already, before I posted to 
the forum, now I am becoming really desperate.

Thank you guys so much for helping!
Elvira
 

-----Original Message-----
From: Radu Preotiuc-Pietro [mailto:[EMAIL PROTECTED] 
Sent: Monday, July 23, 2007 6:54 PM
To: [email protected]
Subject: RE: trouble validating UTF-8 document withinternationalcharacters- 
please help!

Elvira,

I had ran the Schema and the document through the 'validate' utility
that ships with XmlBeans initially. Now that you mentioned it, I have
also tried your code, same result, document validates.

Radu

On Mon, 2007-07-23 at 14:57 -0700, Vinh Nguyen (vinguye2) wrote:
> My guess is that the "e" in Mexico is a double-byte char.  So your XML 
> document should actually be UTF-16, not UTF-8.  In your db table, perhaps you 
> are using 16-bit chars, so your string would correctly appear to have 25 
> chars.  But in byte representation, it's actually 26 bytes = 26 chars.
>  
> 
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Monday, July 23, 2007 2:45 PM
> To: [email protected]
> Subject: RE: trouble validating UTF-8 document with internationalcharacters- 
> please help!
> 
> Radu,
> 
> You mean you tried the attached document with the attached schema, and the 
> same code as in the original question, and it worked with no errors? If you 
> open the attachment, can you go to line 148 and see the same line? Could you 
> send your test case back to me and I'll try to run it? Because when I run my 
> setup, no matter what I do, I cannot get around this error. Are you using the 
> latest release of xmlbeans? Mine is dated 6/12/2006.
> 
> BTW, this document is composed from data from table data, and the column 
> width for this column is defined as 25. That is how the schema is 
> constructed. 
> 
> Thanks for your help.
> Elvira
> 
> -----Original Message-----
> From: Radu Preotiuc-Pietro [mailto:[EMAIL PROTECTED]
> Sent: Monday, July 23, 2007 5:26 PM
> To: [email protected]
> Subject: RE: trouble validating UTF-8 document with internationalcharacters- 
> please help!
> 
> Sorry, my mistake, I did not notice the attachments.
> 
> However, it is not clear to me whether there are 25 or 26 characters in that 
> string. I am rusty on UTF-8 encoding rules and Unicode, but I am sure that 
> the 'é' character can be represented as either 'eacute' (Unicode 00E9) or a 
> composite character (I would imagine there are many ways to represent it in 
> this way). What XMLSchema says is 'count Unicode codepoints' as far as I can 
> tell.
> 
> So, while it is not impossible that there is a bug, I think the far more 
> likely possibility is that your document does not contain the characters you 
> think it contains. I also doubt that the attachment is the same document that 
> gives the error, because I have tried it and it works for me. So you would 
> have to do some additional investigation on this, ideally get the exact bytes 
> from that document.
> 
> Radu
> 
> On Mon, 2007-07-23 at 16:48 -0400, [EMAIL PROTECTED] wrote:
> > Sorry, my e-mail had files attached to it, and the xml file was the 
> > document in question. 
> > Anyway, the offending line is:
> > 
> >          <COMP_NAME>IDES México, S.A. de C.V.</COMP_NAME>
> > 
> > As you can see, there are 25 characters in this element, but xmlbeans 
> > thinks there are 26.
> > 
> > Thanks,
> > Elvira
> > 
> > -----Original Message-----
> > From: Radu Preotiuc-Pietro [mailto:[EMAIL PROTECTED]
> > Sent: Monday, July 23, 2007 4:33 PM
> > To: [email protected]
> > Subject: Re: trouble validating UTF-8 document with internationalcharacters 
> > - please help!
> > 
> > It would be more interesting to see the document in question at the 
> > line and column referenced in the error message: is that string longer 
> > than the declared maxLength facet?
> > 
> > (I wouldn't take XmlSpy as reference, since it is known to be
> > unreliable)
> > 
> > Radu
> > 
> > On Mon, 2007-07-23 at 15:57 -0400, [EMAIL PROTECTED] wrote:
> > > Hello,
> > > 
> > >  
> > > 
> > > I am using XmlBeans to validate a document against its schema.
> > > 
> > > It works fine, except when international characters are used in the 
> > > document.
> > > 
> > > For the attached document and the corresponding schema, the error 
> > > message is as following:
> > > 
> > >  
> > > 
> > > Node: COMP_NAME, Line: 148, Column: 10, Detail: string length (string) is 
> > > greater than maxLength facet (26) for 25
> > >   Document encoding is: null
> > >  
> > > This document is validated with XmlSpy. What am I missing? The document 
> > > file was written as UTF-8. 
> > > 
> > >  
> > > 
> > > The code follows. Thanks so much for your help. 
> > > 
> > >  
> > > 
> > >       private boolean xmlBeanValidate(File xmlFile, List sdocs) {
> > > 
> > >             XmlObject[] schemas = (XmlObject[]) sdocs.toArray(new 
> > > XmlObject[0]);
> > > 
> > >             SchemaTypeLoader sLoader;
> > > 
> > >             Collection compErrors = new ArrayList();
> > > 
> > >             XmlOptions schemaOptions = new XmlOptions();
> > > 
> > >             schemaOptions.setErrorListener(compErrors);
> > > 
> > >  
> > > 
> > >            try {
> > > 
> > >                   sLoader = XmlBeans.loadXsd(schemas, 
> > > schemaOptions);
> > > 
> > >             } catch (Exception e) {
> > > 
> > >                  if(compErrors.isEmpty() || !(e instanceof
> > > XmlException)) {
> > > 
> > >                         e.printStackTrace();
> > > 
> > >                   }
> > > 
> > >                   logError("Schema is invalid");
> > > 
> > >                  for (Iterator i = compErrors.iterator();
> > > i.hasNext();)
> > > 
> > >                         log(i.next().toString());
> > > 
> > >                  return false;
> > > 
> > >             }
> > > 
> > >  
> > > 
> > >             XmlObject xobj = null;
> > > 
> > >            try {
> > > 
> > >                   Reader sr = newFileReader(xmlFile);
> > > 
> > >                   XmlOptions opt = new XmlOptions();
> > > 
> > >                   opt.setCharacterEncoding("UTF-8");
> > > 
> > >                   opt.setLoadLineNumbers();
> > > 
> > >                   xobj = sLoader.parse(sr, null, opt);
> > > 
> > >             } catch (Exception e) {
> > > 
> > >                   logError("xml not loadable: " + e);
> > > 
> > >                   e.printStackTrace();
> > > 
> > >                  return false;
> > > 
> > >             }
> > > 
> > >  
> > > 
> > >             Collection errors = new ArrayList();
> > > 
> > >            if(xobj.schemaType() == XmlObject.type) {
> > > 
> > >                   logError("xml is NOT valid. Document type not 
> > > found.");
> > > 
> > >                  return false;
> > > 
> > >             } else if (xobj.validate(new 
> > > XmlOptions().setErrorListener(errors))){
> > > 
> > >                   log("Document validation completed 
> > > successfully.");
> > > 
> > >                  return true;
> > > 
> > >             }else {
> > > 
> > >                  for (Iterator it = errors.iterator(); 
> > > it.hasNext();) {
> > > 
> > >                         XmlError xmlError = (XmlError)it.next();
> > > 
> > >                     logError("  Node: " 
> > > 
> > > 
> > > +xmlError.getCursorLocation().getDomNode().getNodeName()
> > > 
> > >                               +", Line: " + xmlError.getLine()
> > > 
> > >                               +", Column: " + xmlError.getColumn()
> > > 
> > >                               +", Detail: " + 
> > > xmlError.getMessage());
> > > 
> > >                     logError("  Document encoding is: " 
> > > 
> > > 
> > > +xobj.documentProperties().getEncoding());
> > > 
> > >  
> > > 
> > >                   }
> > > 
> > >                  return false;
> > > 
> > >             }
> > > 
> > >       }
> > > 
> > >  
> > > 
> > > 
> > > --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > Notice:  This email message, together with any attachments, may contain 
> > information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated 
> > entities,  that may be confidential,  proprietary,  copyrighted  and/or 
> > legally privileged, and is intended solely for the use of the individual or 
> > entity named in this message. If you are not the intended recipient, and 
> > have received this message in error, please immediately return this by 
> > email and then delete it.
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> 
> Notice:  This email message, together with any attachments, may contain 
> information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated 
> entities,  that may be confidential,  proprietary,  copyrighted  and/or 
> legally privileged, and is intended solely for the use of the individual or 
> entity named in this message. If you are not the intended recipient, and have 
> received this message in error, please immediately return this by email and 
> then delete it.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 

Notice:  This email message, together with any attachments, may contain 
information  of  BEA Systems,  Inc.,  its subsidiaries  and  affiliated 
entities,  that may be confidential,  proprietary,  copyrighted  and/or legally 
privileged, and is intended solely for the use of the individual or entity 
named in this message. If you are not the intended recipient, and have received 
this message in error, please immediately return this by email and then delete 
it.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to