Re: wrong sort order (lexical vs numeric) in a nested foreach

2012-09-04 Thread Lauren Blau
I think I finally found the culprit. There is a load like this: a = load '/foobar' using CustomJsonLoader('baz') as (m:map[]); -- loading an untyped map then there is a flatten, a1 = foreach a generate a#'id' as id: chararray, flatten(a#'listvals') as (listvals: map[]); -- another untyped map a

Re: wrong sort order (lexical vs numeric) in a nested foreach

2012-09-04 Thread Lauren Blau
unfortunately, I can't put together an example without sharing the custom jsonloader and data. But I've worked around this by explicitly storing and reloading the data. But it sounds like you have it backwards in your attempt to be sneaky. The data actually is an int and should be sorted numericall

Re: wrong sort order (lexical vs numeric) in a nested foreach

2012-08-31 Thread Dmitriy Ryaboy
I tried to reproduce this and haven't been able to -- all my devious attempts to get something that is actually a string to show up as an int in "describe" wind up in class cast exceptions and blown up jobs (not devious enough, clearly). Can you give put together an example that reproduces the iss

Re: wrong sort order (lexical vs numeric) in a nested foreach

2012-08-31 Thread Віталій Тимчишин
I'd try to describe original schema as varchar and the cast during order by, e.g order relation by (char)orderkey1; If pig does not accept cast in order, try to add additional foreach with cast. Last resort could be a udf that does the cast. 2012/8/31 Lauren Blau > Could this be a problem with

Re: wrong sort order (lexical vs numeric) in a nested foreach

2012-08-31 Thread Lauren Blau
Could this be a problem with the original read of the data. It is stored in Json format and read with a custom Json loader. If I save the results of the loader to a file using PigStorage and then run the same script reading from that file the sort is done numerically. I've had other pig script pro

Re: wrong sort order (lexical vs numeric) in a nested foreach

2012-08-30 Thread Lauren Blau
sorry, premature email :-). relation = key1 ,key2,orderkey1,val; //schema is (chararray,int,int,chararray); groupbykey = group relation by (key1,key2); foreach groupbykey { sorted = order relation by orderkey1; generate flatten($0), MyUDF(sorted); } I notice that when the 'sorted' value