A crafty trick would be to use streaming and only emit data once you see the end tag as a pre-processing step.
On Tue, May 22, 2012 at 12:10 PM, Mark Grover <[email protected]> wrote: > Hi Richard, > What Bejoy said is correct. However, another way to get around it would be > pre-process your data between <doc> and </doc> to not contain any newlines. > Then, you should be able to treat that data as string and parse it out > relatively easily. > > Mark > > > ----- Original Message ----- > From: "Bejoy Ks" <[email protected]> > To: [email protected] > Sent: Monday, May 21, 2012 7:22:58 AM > Subject: Re: user define data format > > > > Hi Richard > > > In hive the default record delimiter is the next line character. In your > sample data set, a single row/record is spread across multiple lines. AFAIK > The only possible option here is to write a custom serde for your data. > > > Regards > Bejoy KS > > > > > > From: Richard <[email protected]> > To: "[email protected]" <[email protected]> > Sent: Monday, May 21, 2012 3:14 PM > Subject: user define data format > > > > Hi, I want to use Hive on some data in the following format: > <doc>\0x01 > field1=val1\0x01 > field2=val2\0x01 > ... > </doc>\0x01 > > the lines between <doc> and </doc> are a record. How should I define the > table? > > thanks. > Richard > > > >
