Try -getmerge D
On Thu, May 26, 2011 at 11:20 AM, Subhramanian, Deepak <[email protected]> wrote: > I tried using the fs -cat and sh -cat function to combine the header and > output file to a new file . But it is not working. Does hadoop give an > option to combine two files to a new file in pig script. > > This is the command I used at the end of the pig script. > > STORE out3 INTO '$OUTPUT' USING > org.apache.pig.piggybank.storage.PigStorageSchema(); > > sh -cat $OUTPUT/.pig_header $OUTPUT/part* > $OUTPUT/top10adv.csv > > > hadoop fs -ls pigdbck/output/top10advperimpfileh5 > Found 4 items > -rw-r--r-- 1 root supergroup 30 2011-05-26 17:52 > /user/root/pigdbck/output/top10advperimpfileh5/.pig_header > -rw-r--r-- 1 root supergroup 361 2011-05-26 17:52 > /user/root/pigdbck/output/top10advperimpfileh5/.pig_schema > drwxr-xr-x - root supergroup 0 2011-05-26 17:51 > /user/root/pigdbck/output/top10advperimpfileh5/_logs > -rw-r--r-- 1 root supergroup 117 2011-05-26 17:52 > /user/root/pigdbck/output/top10advperimpfileh5/part-r-00000 > > > On 26 May 2011 12:02, Subhramanian, Deepak < > [email protected]> wrote: > >> I thought any java class extension was a UDF. Thanks Dmitriy for >> clarifying. Yes. I meant extending the StoreFunce. I guess I will use the >> PigStorageSchema for the time being as I am tight on my deadlines. And use >> the cat to concatenate the header. I didnt realized that we can use the cat >> directly in the pig script and that is why thought of extending the >> StoreFunc. Thanks Alan for your inputs. >> >> I will have to read more on how the output part files are created on hdfs >> so that I can combine all the part files at the end of the pig script for a >> final output if the file size is very big. >> >> >> On 25 May 2011 21:22, Dmitriy Ryaboy <[email protected]> wrote: >> >>> Still not clear on how you expect a UDF to help.. normally when we say >>> UDFs, we mean functions work on individual tuples. They don't have >>> anything to do with how you store data. >>> >>> You probably mean StoreFunc; since in this case you want a StoreFunc >>> that messes with the file format, as opposed to writing a side file >>> like PigStorageSchema does, you'll need to go pretty deep -- write a >>> whole StoreFunc + OutputFormat + RecordWriter stack. >>> >>> >>> >>> >>> >>> On Wed, May 25, 2011 at 12:51 PM, Subhramanian, Deepak >>> <[email protected]> wrote: >>> > Thanks for the inputs. I am looking for a UDF which I can use to store >>> the >>> > headers in the pig output file. >>> > >>> > On 25 May 2011 18:30, Dmitriy Ryaboy <[email protected]> wrote: >>> > >>> >> Can you explain what UDF you are looking for? >>> >> The intended usage for the .pig_header file is to cat it: >>> >> >>> >> hadoop fs -cat myresults/.pig_header myresults/part* >>> >> >>> >> (which drops the header right on top of your data). >>> >> >>> >> We don't want to put the header inside the data files because that can >>> >> break subsequent processing. >>> >> >>> >> As for names of the fields, that's a pig feature, it's there for >>> >> disambiguation. If you don't like it, you can rename the fields: >>> >> FLATTEN(aggregated) as (advertiserId, Advertiser, OrderId, ....) >>> >> >>> >> >>> >> >>> >> D >>> >> >>> >> On Wed, May 25, 2011 at 9:00 AM, Subhramanian, Deepak >>> >> <[email protected]> wrote: >>> >> > Hi , I just realized that it is creating .pig_header file in the same >>> >> output >>> >> > directory. I guess I need to create a new UDF. Also if I am grouping >>> it >>> >> is >>> >> > appending the tag aggregated::group: to the header column. Is Flatten >>> is >>> >> not >>> >> > suppose to remove the group ? >>> >> > >>> >> > cat .pig_header >>> >> > aggregated::group::AdvertiserID null::Advertiser >>> >> > aggregated::group::OrderID aggregated::group::AdID >>> >> > aggregated::group::CreativeID aggregated::group::CreativeVersion >>> >> > aggregated::group::CreativeSizeID aggregated::group::SiteID >>> >> > aggregated::group::PageID aggregated::group::Keyword >>> >> > aggregated::Impressions >>> >> > >>> >> > >>> >> > >>> >> > On 25 May 2011 16:48, Subhramanian, Deepak < >>> >> > [email protected]> wrote: >>> >> > >>> >> >> I tried the PigStorageSchema. For some reason it doesnt create the >>> >> headers. >>> >> >> Is it because I am loading the data using another UDF ? >>> >> >> >>> >> >> This is the command I used in the pigscript.. >>> >> >> >>> >> >> STORE out INTO '$OUTPUT' USING >>> >> >> org.apache.pig.piggybank.storage.PigStorageSchema(); >>> >> >> >>> >> >> Thanks, Deepak >>> >> >> >>> >> >> >>> >> >> On 25 May 2011 16:13, Dmitriy Ryaboy <[email protected]> wrote: >>> >> >> >>> >> >>> You can try PigStorageSchema from the piggybank. >>> >> >>> >>> >> >>> -----Original Message----- >>> >> >>> From: "Subhramanian, Deepak" <[email protected]> >>> >> >>> To: [email protected] >>> >> >>> Sent: 5/25/2011 5:28 AM >>> >> >>> Subject: Storing Headers in Pig Output File >>> >> >>> >>> >> >>> Is there a way to store the headers (titles of each) column using >>> the >>> >> >>> Store >>> >> >>> command in Pig Script (STORE out3 INTO '$OUTPUT' USING >>> PigStorage();. >>> >> >>> Right >>> >> >>> now it stores only the data. Somewhere I read in Pig0.8 it stores >>> the >>> >> >>> header >>> >> >>> with map reduce option. Do we have to supply extra parameters ? >>> >> >>> >>> >> >>> Thanks, Deepak >>> >> >>> >>> > >>> >> >> > > -- > "Please consider the environment before printing this e-mail" > > The Newspaper Marketing Agency: Opening Up Newspapers: > www.nmauk.co.uk > > This e-mail and any attachments are confidential, may be legally privileged > and are the property of > News International Limited (which is the holding company for the News > International group, is > registered in England under number 81701 and whose registered office is 3 > Thomas More Square, > London E98 1XY, VAT number GB 243 8054 69), on whose systems they were > generated. > > If you have received this e-mail in error, please notify the sender > immediately and do not use, > distribute, store or copy it in any way. Statements or opinions in this > e-mail or any attachment are > those of the author and are not necessarily agreed or authorised by News > International Limited or > any member of its group. News International Limited may monitor outgoing or > incoming emails as > permitted by law. It accepts no liability for viruses introduced by this > e-mail or attachments. >
