Joe, Forgive the length but I'm likely to bump my head on this issue in the future, so a fuller than necessary explanation:
Started with the simplest regex that would capture the parens: 1. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, 171) ", "\(\d.*\)") 1. Result: <fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"> <fn:non-match>On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic </fn:non-match> <fn:match>(UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, 171)</fn:match> <fn:non-match> </fn:non-match> </fn:analyze-string-result> OK, so what do we know about the desired matches? Digits plus (, ) with no spaces. Yes? 2. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, 171) ", "\(\d, \d+\)") So I match parens plus digits, ", " (comma plus whitespace), digits plus paren. 2. Result: <fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"> <fn:non-match>On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. </fn:non-match> <fn:match>(79, 171)</fn:match> <fn:non-match> </fn:non-match> </fn:analyze-string-result> I need to split the two numbers and what better to do that than alternative matching? 3. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, 171) ", "\(\d+ | \d+\)") 3. Result: <fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"> <fn:non-match>On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79,</fn:non-match> <fn:match> 171)</fn:match> <fn:non-match> </fn:non-match> </fn:analyze-string-result> Your probably already laughing because you see my mistake, which I correct in #4: 4. fn:analyze-string("On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. (79, 171) ", "\(\d+|\d+\)") 4. Result: <fn:analyze-string-result xmlns:fn="http://www.w3.org/2005/xpath-functions"> <fn:non-match>On February 13, 1968, Secretary of State Dean Rusk sent a message to Israeli Foreign Minister Abba Eban calling upon Israel to endorse openly Resolution 242, and on May 13 President Johnson sent a letter to United Arab Republic (UAR) President Gamal Abdel Nasser, urging him to seize the unique opportunity offered by the Jarring mission to achieve peace. </fn:non-match> <fn:match>(79</fn:match> <fn:non-match>,</fn:non-match> <fn:match> 171)</fn:match> <fn:non-match> </fn:non-match> </fn:analyze-string-result> The error was here: "\(\d+ | \d+\)", which would only match (any-digit plus a white space, whereas the number in question was followed by *no space* and a comma. Know thy data! Examples created on BaseX. BTW, I started from known good examples in XQuery Functions 3.1, verified that they worked and then created the search strings. Hope this helps! Patrick On 04/23/2018 12:22 PM, Joe Wicentowski wrote: > Hi all, > > I have encountered an unexpected challenge constructing a regex for a > pattern I am looking for. I am looking for numbers in parentheses. > For example, in the following string: > > "On February 13, 1968, Secretary of State Dean Rusk sent a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, > urging him to seize the unique opportunity offered by the Jarring > mission to achieve peace. (79, 171)" > > ... I would like to match "79" and "171" (but not "UAR" or "13" or > "1968"). I have been trying to construct a regex for use with > analyze-string to capture this pattern, but I have not been > successful. I have tried the following: > > analyze-string($string, "(?:\()(?:(\d+)(?:, )?)+(?:\))") > > In other words, there are these 3 components: > > 1. (?:\() a non-capturing group consisting of an open parens, > followed by > 2. (?:(\d+)(?:, )?)+ one or more non-capturing groups consisting of > (a number followed by an optional, non-matching comma-and-space), > followed by > 3. (?:\)) a non-capturing group consisting of a close parens > > I was expecting to get the following output: > > <fn:analyze-string-result > xmlns:fn="http://www.w3.org/2005/xpath-functions"> > <fn:non-match>On February 13, 1968, Secretary of State Dean Rusk > sent a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, > urging him to seize the unique opportunity offered by the Jarring > mission to achieve peace. </fn:non-match> > <fn:match>(<fn:group nr="1">79</fn:group>, > <fn:group nr="1">171</fn:group>)</fn:match> > </fn:analyze-string-result> > > However, the actual result is that the first number ("79") is skipped, > and only the 2nd number ("171") is captured: > > <fn:analyze-string-result > xmlns:fn="http://www.w3.org/2005/xpath-functions"> > <fn:non-match>On February 13, 1968, Secretary of State Dean Rusk > sent a > message to Israeli Foreign Minister Abba Eban calling upon Israel to > endorse openly Resolution 242, and on May 13 President Johnson sent a > letter to United Arab Republic (UAR) President Gamal Abdel Nasser, > urging him to seize the unique opportunity offered by the Jarring > mission to achieve peace. </fn:non-match> > <fn:match>(79, > <fn:group nr="1">171</fn:group>)</fn:match> > </fn:analyze-string-result> > > What am I missing? Can anyone suggest a regex that is able to capture > both numbers inside the parentheses? Or do I need to make a two-pass > run through this, finding parenthetical text with a first > analyze-string like "\(.+\)" and then looking inside its matches with > a second analyze-string like "(\d+)(?:, )?"? > > Thanks, > Joe > > > _______________________________________________ > talk@x-query.com > http://x-query.com/mailman/listinfo/talk -- Patrick Durusau patr...@durusau.net Technical Advisory Board, OASIS (TAB) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau
signature.asc
Description: OpenPGP digital signature
_______________________________________________ talk@x-query.com http://x-query.com/mailman/listinfo/talk