Kartik, Looks like you're facing this issues: https://issues.apache.org/jira/browse/PIG-2507 What version of Pig are you using? The issue is fixed in 0.11.2 and 0.12. So if you upgrade to these versions, your problem should go away.
If you're unable to upgrade for some reason, your best bet is to write a custom UDF. But the general idea remains the same, write a regex to extract the appropriate substring and project that from the UDF. Unmesha, Start a new thread with your question so we don't pollute this thread for Kartik. Can you give some samples as well? I'm not sure I understood your question. On Mon, May 12, 2014 at 3:05 AM, kartik manocha <[email protected]>wrote: > Pradeep, > > Thanks for the pointers, but as i mentioned that I need to extract that > string till semicolon, so facing issues with that. > > I need to print it before semiclon that's causing pain as when I mention > semicolon in regex it treats it as end of statement & produces error. > > However without mentioning semicolon it works fine but produces complete > stuff starting with B75. > eg . > B=foreach D generate REGEX_EXTRACT(test,'(B75.*)',1); > > Is there any way by which I can mention semicolon in my above regex, so > that it prints the string before that. > > > Thanks, > Kartik > > > > On Mon, May 12, 2014 at 2:03 PM, Pradeep Gollakota <[email protected] > >wrote: > > > Check out > > http://archive.cloudera.com/cdh/3/pig/piglatin_ref2.html#REGEX_EXTRACT > > > > This may suit your needs > > > > > > On Mon, May 12, 2014 at 12:16 AM, kartik manocha <[email protected] > > >wrote: > > > > > Hi, > > > > > > I am new to pig & facing an issue in filtering out a string from a > field, > > > mentioned is the scenario. > > > > > > - > I am loading data with several fields, among those fields there is > > > field name called 'test_data' > > > - > There are lot of things in this field, I wanted to filter out a > > string > > > from this field which starts from B75 & ends with semi colon. > > > - > After taking this string out, wanted to add this as a new field to > > the > > > existing bag which was loaded > > > > > > I tried using INDEXOF UDF, but that works for a single character only, > > > however when I tried using that for single character, it returns () > only > > > instead of index number. I was just testing, & by manually providing > > > indexes in SUBSTRING UDF, it was generating string. > > > > > > But unable to get the position using indexof UDF, or may be there could > > be > > > a better of doing this. > > > > > > If you have any pointers / suggestions, please share. > > > > > > Thanks in advance. > > > > > > > > > Best, > > > Kartik > > > > > >
