A crafty trick would be to use streaming and only emit data once you
see the end tag as a pre-processing step.

On Tue, May 22, 2012 at 12:10 PM, Mark Grover <[email protected]> wrote:
> Hi Richard,
> What Bejoy said is correct. However, another way to get around it would be 
> pre-process your data between <doc> and </doc> to not contain any newlines. 
> Then, you should be able to treat that data as string and parse it out 
> relatively easily.
>
> Mark
>
>
> ----- Original Message -----
> From: "Bejoy Ks" <[email protected]>
> To: [email protected]
> Sent: Monday, May 21, 2012 7:22:58 AM
> Subject: Re: user define data format
>
>
>
> Hi Richard
>
>
> In hive the default record delimiter is the next line character. In your 
> sample data set, a single row/record is spread across multiple lines. AFAIK 
> The only possible option here is to write a custom serde for your data.
>
>
> Regards
> Bejoy KS
>
>
>
>
>
> From: Richard <[email protected]>
> To: "[email protected]" <[email protected]>
> Sent: Monday, May 21, 2012 3:14 PM
> Subject: user define data format
>
>
>
> Hi, I want to use Hive on some data in the following format:
> <doc>\0x01
> field1=val1\0x01
> field2=val2\0x01
> ...
> </doc>\0x01
>
> the lines between <doc> and </doc> are a record. How should I define the 
> table?
>
> thanks.
> Richard
>
>
>
>

Reply via email to