Cleaning XML - Unicode 0x0

2006-11-06 Thread Josh Nathanson
Hey all, I've got a script that uses to create an xml object from data taken from a db. Occasionally control characters sneak into the database and break the xml when I try to do the output. I've been able to use rereplace to "clean" the xml until today. I got the old "An invalid XML chara

Re: Cleaning XML - Unicode 0x0

2006-11-06 Thread Rob Wilkerson
On 11/6/06, Josh Nathanson <[EMAIL PROTECTED]> wrote: > Hey all, > > I've got a script that uses to create an xml object from data taken > from a db. Occasionally control characters sneak into the database and > break the xml when I try to do the output. I've been able to use rereplace > to "cle

Re: Cleaning XML - Unicode 0x0

2006-11-06 Thread Josh Nathanson
> Don't know whether it'll work for you, but the regex I've used > successfully is REReplace ( mystring, '>\s*<', '><', 'ALL' ). It > clears any whitespace between tags themselves. No dice, apparently whitespace is not the problem...anyone else? -- Josh ~~~

Re: Cleaning XML - Unicode 0x0

2006-11-06 Thread Rob Wilkerson
On 11/6/06, Josh Nathanson <[EMAIL PROTECTED]> wrote: > > Don't know whether it'll work for you, but the regex I've used > > successfully is REReplace ( mystring, '>\s*<', '><', 'ALL' ). It > > clears any whitespace between tags themselves. > > No dice, apparently whitespace is not the problem...a

Re: Cleaning XML - Unicode 0x0

2006-11-06 Thread Josh Nathanson
don't get why I can't get at that Unicode 0x0 character in the same way I can get at Unicode 0x5, 0xA, 0xF or whatever. -- Josh - Original Message - From: "Rob Wilkerson" <[EMAIL PROTECTED]> To: "CF-Talk" Sent: Monday, November 06, 2006 4:07 PM Sub

Re: Cleaning XML - Unicode 0x0

2006-11-06 Thread Rob Wilkerson
or whatever. > > -- Josh > > > - Original Message - > From: "Rob Wilkerson" <[EMAIL PROTECTED]> > To: "CF-Talk" > Sent: Monday, November 06, 2006 4:07 PM > Subject: Re: Cleaning XML - Unicode 0x0 > > > > On 11/6/06, Josh Nathans

Re: Cleaning XML - Unicode 0x0

2006-11-06 Thread Paul Hastings
Josh Nathanson wrote: > I just want to be able to clean the data on the way out of the database. I > don't get why I can't get at that Unicode 0x0 character in the same way I > can get at Unicode 0x5, 0xA, 0xF or whatever. how on earth does a NULL leak into your shopping cart data? maybe you sh

Re: Cleaning XML - Unicode 0x0

2006-11-07 Thread Josh Nathanson
Yup, that's the first thing I tried...when it didn't work I posted here hoping for further guidance. -- Josh - Original Message - From: "Rob Wilkerson" <[EMAIL PROTECTED]> To: "CF-Talk" Sent: Monday, November 06, 2006 6:28 PM Subject: Re: Cleaning

Re: Cleaning XML - Unicode 0x0

2006-11-07 Thread Rob Wilkerson
On 11/7/06, Josh Nathanson <[EMAIL PROTECTED]> wrote: > Yup, that's the first thing I tried...when it didn't work I posted here > hoping for further guidance. Is the null character in your data or in the XML itself somehow? If the former, then I think CDATA may be the way to go. It's a good idea

Re: Cleaning XML - Unicode 0x0

2006-11-07 Thread Josh Nathanson
other option. I'll keep grinding on trying to regex the null character out of there and let the list know if I figure anything out. -- Josh - Original Message - From: "Rob Wilkerson" <[EMAIL PROTECTED]> To: "CF-Talk" Sent: Tuesday, November 07, 2006 9

Re: Cleaning XML - Unicode 0x0

2006-11-07 Thread Rob Wilkerson
On 11/7/06, Josh Nathanson <[EMAIL PROTECTED]> wrote: > Thanks for your help Rob. I just don't know which field is the culprit as > far as the null character (there's no description field or anything obvious > like that), and I'm hesitant to CDATA every single field that's going into > the db, unl

Re: Cleaning XML - Unicode 0x0 SOLVED sorta

2006-11-07 Thread Josh Nathanson
y, November 07, 2006 10:19 AM Subject: Re: Cleaning XML - Unicode 0x0 > On 11/7/06, Josh Nathanson <[EMAIL PROTECTED]> wrote: >> Thanks for your help Rob. I just don't know which field is the culprit >> as >> far as the null character (there's no description field

Re: Cleaning XML - Unicode 0x0 SOLVED sorta

2006-11-07 Thread Rob Wilkerson
On 11/7/06, Josh Nathanson <[EMAIL PROTECTED]> wrote: > > Yes it's non scalable...but, since the data is not going into the database > as xml, just plain old form fields, I can't use CDATA on the way in anyway, > correct? I would have to run the same regex on each of the incoming form > fields tha

RE: Cleaning XML - Unicode 0x0 SOLVED sorta

2006-11-07 Thread Matt Quackenbush
overs up the real issue, which is invalid data being entered into the database. Thanks, Matt -Original Message- From: Josh Nathanson [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 07, 2006 1:14 PM To: CF-Talk Subject: Re: Cleaning XML - Unicode 0x0 SOLVED sorta OK, I added this

Re: Cleaning XML - Unicode 0x0 SOLVED sorta

2006-11-07 Thread Rob Wilkerson
On 11/7/06, Matt Quackenbush <[EMAIL PROTECTED]> wrote: > Josh, > > I think the point that Rob and others were making is that your data should > be validated and cleaned up BEFORE being inserted into the database - > whether it's inserted as XML or not is completely and utterly irrelevant. That's

Re: Cleaning XML - Unicode 0x0 SOLVED sorta

2006-11-07 Thread Josh Nathanson
From: "Rob Wilkerson" <[EMAIL PROTECTED]> To: "CF-Talk" Sent: Tuesday, November 07, 2006 2:04 PM Subject: Re: Cleaning XML - Unicode 0x0 SOLVED sorta > On 11/7/06, Matt Quackenbush <[EMAIL PROTECTED]> wrote: >> Josh, >> >> I think the

Re: Cleaning XML - Unicode 0x0 SOLVED sorta

2006-11-07 Thread Rob Wilkerson
On 11/7/06, Josh Nathanson <[EMAIL PROTECTED]> wrote: > > Maybe I can summarize: > 1) CDATA is not helpful when encountering control characters. True. Does lead you to wonder, though, how they're sneaking in there. Folks don't just type in null characters... > 2) Thus, I have to use rereplace w

Re: Cleaning XML - Unicode 0x0 SOLVED sorta

2006-11-07 Thread Josh Nathanson
Hey Rob, > True. Does lead you to wonder, though, how they're sneaking in there. > Folks don't just type in null characters... It leads me to wonder allright!! Maybe forms autofill or something? > REReplace ( mystring, '[^\x00-\x7f]', '', 'ALL' ) > Again, it's a pretty broad brush, but it shou

Re: Cleaning XML - Unicode 0x0 SOLVED sorta

2006-11-07 Thread Paul Hastings
Josh Nathanson wrote: > 3) If I did the rereplace on the way into the db, it still may not catch all > offending control characters. There may be a new one that isn't in the > regex yet. Additionally, I don't want to disrupt the shopper's checkout > process if at all possible. there can't be.

Re: Cleaning XML - Unicode 0x0 SOLVED sorta

2006-11-07 Thread Rob Wilkerson
On 11/7/06, Josh Nathanson <[EMAIL PROTECTED]> wrote: > Hey Rob, > > > True. Does lead you to wonder, though, how they're sneaking in there. > > Folks don't just type in null characters... > > It leads me to wonder allright!! Maybe forms autofill or something? I wish I had an answer for you. It