Unfortunately I've realised that boundscript.describe doesn't return a
string. It returns void but prints to stdout. This means I have to go
through a rather painful process of calling a separate python process that
calls boundscript.describe and then capture the stdout of that process in
order to obtain the schema. I don't know why it doesn't return a string.
Maybe there is an easier way I am missing here. If people have any ideas
for  a more elegant solution I would be happy to contribute develop it and
contribute the code.

Martin







On 15 November 2012 20:20, Jonathan Coveney <jcove...@gmail.com> wrote:

> Martin,
>
> That is a reasonable workaround. Even in java UDF's, you can't directly
> access fields by name. Tuples are indexed only by numbers. Using the Schema
> is how I would do it.
>
>
> 2012/11/14 Martin Goodson <mar...@qubitproducts.com>
>
> > Sorry to reply to my question post but I've found a workaround that I
> > thought I should put here:
> >
> > use embedded pig
> > access the schema with boundscript.describe().
> > input the schema as a parameter into the udf call.
> >
> > Thanks
> > Martin
> >
> >
> >
> >
> > On 14 November 2012 16:17, Martin Goodson <mar...@qubitproducts.com>
> > wrote:
> >
> > > I normally deal with very large tuples with many fields. Its a pain to
> > > deal with these in python udfs since I can't figure out a way to input
> > > schemas into the udf. I have to hard code the column number in the
> UDFs,
> > > which is a maintenance nightmare.
> > >
> > > It seems that java UDFs receive the full tuple in their exec methods so
> > > that the correct fields can be identified, whereas python UDFs only
> > receive
> > > lists objects (with field names stripped). Is there any way to get the
> > > behaviour of python UDFs to conform to the java behaviour?
> > >
> > >
> > > Thanks for any ideas
> > > Martin
> > >
> > >
> >
>

Reply via email to