On second look, I think the issue is more clear. The regex you have is: [\x30-\x39\x41-\x46\x61-\x66]+?(?=\x54)
Those hex values are all ASCII characters, and could be rewritten like so: [0-9A-Fa-f]+?(?=T) So your regex actually will only match data that contains those ASCII characters followed by the letter T. But I suspect your data isn't ASCII, it's actual binary data that could be anything. Since your data doesn't contain those ASCII characters, your pattern will fail to match and the matched length is considered zero. It then decode 39 bytes of data, with the initial bytes being binary data followed by the beginning of the ASCII string. So the schema needs to be modified to either use a different regex or use some other method to determine where the data ends and the message begins. To me, it seems odd to have a binary format where the length of binary data is just some amount until it finds the letter 'T', so I would think a better description would exist. That said, such a regex would look like this: [^T]+ - Steve On 11/19/18 12:50 PM, Steve Lawrence wrote: > Roger, > > I am unable to reproduce this issue. I've created a TDML file at the > below link, which defines a schema and a test case with sample input > data and expected infoset, based on your description. > > https://gist.github.com/stevedlawrence/c4051386c4ed58279dbcae1e75d08218 > > This can be tested with: > > daffodil test -i hexPattern.tml > > And I get the output: > > [Fail] hexPattern > Failure Information: > Left over data. Consumed 408 bit(s) with 16 bit(s) remaining. > > Total: 1, Pass: 0, Fail: 1, Not Found: 0 > > So it fails, but it fails because the schema does not consume the > trailing PE, so that's expected. The actual infoset does match the > expected infoset. > > Maybe your input data is different or there is some other property you > have defined in dfdl:format that is changing the behavior? > > Thanks, > - Steve > > On 11/17/18 10:54 AM, Costello, Roger L. wrote: >> Hello DFDL Community, >> >> Within my input is this: >> >> - a series of bytes >> - then the string: "This program cannot be run in DOS mode." >> - then another series of bytes until arriving at this string: "PE" >> >> I figured that for the first series of bytes I would use xs:hexBinary whose >> length ends when getting to "T" (hex 54) >> >> <xs:element name="Instructions_in_hex" >> type="xs:hexBinary" >> dfdl:lengthKind="pattern" >> dfdl:lengthPattern="[\x30-\x39\x41-\x46\x61-\x66]+?(?=\x54)" /> >> >> The next item is a string of length 39 >> >> <xs:element name="Message" >> type="xs:string" >> dfdl:lengthUnits="characters" >> dfdl:lengthKind="explicit" >> dfdl:length="39" /> >> >> The last item is a series of hex digits whose length ends when getting to >> "P"(hex 50) >> >> <xs:element name="Instructions_in_hex" >> type="xs:hexBinary" >> dfdl:lengthKind="pattern" >> dfdl:lengthPattern="[\x30-\x39\x41-\x46\x61-\x66]+?(?=\x50)" /> >> >> At the bottom of this message is the complete set of declarations. >> >> Unfortunately, it doesn't work. The first <Instructions_in_hex> picks up >> nothing. Then the <Message> element erroneously picks up a bunch of hex >> digits and the first part of the string "This program cannot be run in DOS >> mode.". Then it crashes. >> >> What am I doing wrong, please? /Roger >> >> <xs:element name="DOS_Stub"> >> <xs:complexType> >> <xs:sequence> >> <xs:element name="Instructions_in_hex" >> type="xs:hexBinary" >> dfdl:lengthKind="pattern" >> >> dfdl:lengthPattern="[\x30-\x39\x41-\x46\x61-\x66]+?(?=\x54)" /> >> <xs:element name="Message" >> type="xs:string" >> dfdl:lengthUnits="characters" >> dfdl:lengthKind="explicit" >> dfdl:length="39" /> >> <xs:element name="Instructions_in_hex" >> type="xs:hexBinary" >> dfdl:lengthKind="pattern" >> >> dfdl:lengthPattern="[\x30-\x39\x41-\x46\x61-\x66]+?(?=\x50)" /> >> </xs:sequence> >> </xs:complexType> >> </xs:element> >> >
