Hello, In fact, an entity can store more than a single character. It can be a string (less common) or a more complex structure (I've never seen it in real usage).
See http://msdn.microsoft.com/en-US/en-en/library/ms256483%28v=vs.110%29.aspx for examples. To answer Greg, when you read an XML file, you create a memory tree structure of elements (having attributes) and text nodes (and less common nodes: comments, processing instructions...). When the text node is parsed, any escaping or entity is substitued with its final value, resulting in a "canonical" string. You can not tell if any character was put raw in the file, or if it was an entity. So "Sword", "Sword", "Sword" and "<[CDATA[Sword]]>" result in the same text node. You don't have to care how the text was writen in the file, you got the same final result. HTTP (not HTML) use a different encoding system with %<hexa>, (for example %20 for space) that allow to mix easilly both escaping systems. This could be used for escaping space(%20), colons(%3A) and percents(%25) in gloss, lemma and morph. It should allow to represent any character in the content. On Fri, Dec 12, 2014 at 08:01:31AM -0600, Greg Hellings wrote: > If that's the case, how does it handle escaping <>? I believe entity > replacement is after XML validation but before passing them to a > transformer or such. > On Dec 12, 2014 7:52 AM, "DM Smith" <dmsm...@crosswire.org> wrote: > > > Best I can recall: > > Nope. An entity is merely an alternate way of specifying a character. The > > XML parser is supposed to replace the entity with the corresponding code > > point before the value is evaluated against the schema. > > > > On Dec 12, 2014, at 8:49 AM, Greg Hellings <greg.helli...@gmail.com> > > wrote: > > > > It should be possible to escape any such characters with an XML entity, no? > > On Dec 12, 2014 7:44 AM, "DM Smith" <dmsm...@crosswire.org> wrote: > > > >> > >> > On Dec 12, 2014, at 8:26 AM, Peter Von Kaehne <ref...@gmx.net> wrote: > >> > > >> > Gesendet: Freitag, 12. Dezember 2014 um 13:16 Uhr > >> > Von: "Troy A. Griffitts" <scr...@crosswire.org> > >> > > >> >> Not sure, but I thought we used optional prefixes to specify the kind > >> of gloss if there are multiple, e.g., > gloss="en_US:18 wheeler > >> en_UK:articulated lorry" > >> > > >> > Should there be an option to escape colons? > >> > >> IMHO: > >> Yes. > >> > >> The definition of gloss in the schema is xs:string, not osisGenRegex. > >> The former places no semantic on the content an allows for an empty > >> string. > >> > >> If gloss should have a semantic, then it should be changed in the OSIS > >> spec. > >> > >> The latter is used by lemma and morph and is specified as: > >> ((((\p{L}|\p{N}|_)+)(\.(\p{L}|\p{N}|_))*:)?([^:\s])+) > >> which basically is work:value. > >> If I read this right it does not allow for : to be escaped. I know we > >> allow lemma=“x:a y:b” but I don’t see that this allows for the pattern to > >> be repeated, separated by spaces. > >> > >> The pattern would need to change ([^:\s])+ to (\\:|[^:\s])+ [ not > >> tested ] > >> > >> In His Service, > >> DM > >> _______________________________________________ > >> sword-devel mailing list: sword-devel@crosswire.org > >> http://www.crosswire.org/mailman/listinfo/sword-devel > >> Instructions to unsubscribe/change your settings at above page > > _______________________________________________ sword-devel mailing list: sword-devel@crosswire.org http://www.crosswire.org/mailman/listinfo/sword-devel Instructions to unsubscribe/change your settings at above page