Sergey created PIG-3625:
---------------------------

             Summary: Strange schema transformation and API path to get nested 
objects
                 Key: PIG-3625
                 URL: https://issues.apache.org/jira/browse/PIG-3625
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.10.1
         Environment: CDH 4.4
            Reporter: Sergey


Hi, here is a part fo my script:
{code}
describe groupedNotMatchedSaleItems;
/*
groupedNotMatchedSaleItems:     {
                                                                group: long,
                                                                
notMatchedSaleItems: {(sale_id: long,sale_item_id: long)}
                                                        }
*/
describe groupedFlatSales;
groupedFlatSales:                       {
                                                                group: long,
                                                                flatSales: 
{(npl_id: long,block_id: int,is_napoleon: int,rec_cnt: int,recs: 
chararray,item_id: long,shop_id: int,internal_id: int,catalog_category_id: 
long,sale_item_id: long,sale_id: long,price: int,count: int)}
                                                        }

describe projectedRecsOf2ndLevel;
/*
projectedRecsOf2ndLevel:        {sale_id: long,sale_item_id: long,npl_id: 
long,recs: chararray}
*/

cogroupedSalesNotMatched = COGROUP groupedFlatSales            by group,
                                   groupedNotMatchedSaleItems  by group,
                                   projectedRecsOf2ndLevel     by sale_id;

describe cogroupedSalesNotMatched;
/*
cogroupedSalesNotMatched: {
                                                group: long,

                                                groupedFlatSales: {
                                                        (
                                                                group: long,
                                                                flatSales: 
{(npl_id: long,block_id: int,is_napoleon: int,rec_cnt: int,recs: 
chararray,item_id: long,shop_id: int,internal_id: int,catalog_category_id: 
long,sale_item_id: long,sale_id: long,price: int,count: int)}
                                                        )
                                                },

                                                groupedNotMatchedSaleItems: {
                                                        (
                                                                group: long,
                                                                
notMatchedSaleItems: {(sale_id: long,sale_item_id: long)}
                                                        )
                                                },

                                                projectedRecsOf2ndLevel: {
                                                        (sale_id: 
long,sale_item_id: long,npl_id: long,recs: chararray)
                                                        
*/

secondLevelRecommendations = FOREACH cogroupedSalesNotMatched{
                                GENERATE 
NplRecSecondLevelMatcher(groupedNotMatchedSaleItems.notMatchedSaleItems,
                                                                  
groupedFlatSales.flatSales,
                                                                  
projectedRecsOf2ndLevel);
                            }
{code}
NplRecSecondLevelMatcher is a Java UDF
Input shema inside UDF is:
{code}
{
        {
                (
                    notMatchedSaleItems:{(sale_id: long,sale_item_id: long)}
                )
        },
        {
                (
                    flatSales:{(npl_id: long,block_id: int,is_napoleon: 
int,rec_cnt: int,recs: chararray,item_id: long,shop_id: int,internal_id: 
int,catalog_category_id: long,sale_item_id: long,sale_id: long,price: 
int,count: int)}
                )
        },

                projectedRecsOf2ndLevel: {(sale_id: long,sale_item_id: 
long,npl_id: long,recs: chararray)}
}
{code}
Why is it so strage for notMatchedSaleItems and flatSales?
I have to write this strage code to get access to notMatchedSaleItems bag:
{code}
/**
It's a groovy
@param input is an input tuple for the UDF
@param bagName is a bag name in schema. data-fu lib is used.
 def getInputBag(Tuple input, String bagName){
        def bag = getBag(input, bagName)
        (bag.iterator().next() as Tuple).get(0) as DataBag
    }
*/
{code}
I supposed that 
{code}
(DataBag)udfInputTuple.get(0) should return the bag with "notMatchedSaleItems"
{code}

Why my input is wrapped with these bags and tuples?



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to