Re: add column number using pig UDF

2015-04-27 Thread Shahab Yunus
Just to add one thing form my experience (and please feel free to correct me if I am wrong) that even when you increase the limit, it can't go above 3000. Regards, Shahab On Mon, Apr 27, 2015 at 1:21 PM, Daniel Dai da...@hortonworks.com wrote: This is the known issue with rank implementation

Re: split data into multiple files

2015-01-06 Thread Shahab Yunus
Have you looked at the SPLIT operator in Pig? Does that help? http://pig.apache.org/docs/r0.12.0/basic.html#SPLIT Regards, Shahab On Tue, Jan 6, 2015 at 8:51 AM, Jumsheed jumsh...@gmail.com wrote: Hi, I have a file with data in below format, A abcdefghijklmnop abcdefghijklmnop

Re: Counters Limit Exceeded

2014-10-16 Thread Shahab Yunus
Do you have access to the Yarn or MR UI? It can be done from there. 'Job Counters Limit' is the name of the configuration there (mapreduce.job.counters.max). This is in Yarn. Regards, Shahab On Thu, Oct 16, 2014 at 5:44 PM, Rodrigo Ferreira web...@gmail.com wrote: And how can I do that? I'm

Re: Problem in understanding UDF COUNT

2014-07-22 Thread Shahab Yunus
. The bag c contains the groups and not hte bag b. TThanks. On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus shahab.yu...@gmail.com wrote: Have you seen this documentation and blog? http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig / http://pig.apache.org/docs

Re: Problem in understanding UDF COUNT

2014-07-21 Thread Shahab Yunus
Have you seen this documentation and blog? http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/ http://pig.apache.org/docs/r0.9.2/func.html#count They explain this in detail. Regards, Shahab On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal dobhalashish...@gmail.com wrote: a

Re: Problem in understanding UDF COUNT

2014-07-21 Thread Shahab Yunus
as the arguement in the COUNT(b) function. The bag c contains the groups and not hte bag b. TThanks. On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus shahab.yu...@gmail.com wrote: Have you seen this documentation and blog? http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig

Re: Using the fs command in pig

2014-03-20 Thread Shahab Yunus
Not an exact solution but an idea. You can use the STRSPLIT function first in Pig to parse your variable 'path'. http://pig.apache.org/docs/r0.11.1/func.html#strsplit Then for each split-ed path you can use the STORE command. http://pig.apache.org/docs/r0.7.0/piglatin_ref1.html#Store+vs.+Dump

Re: Running a Pig Jar (from hadoop)

2014-02-16 Thread Shahab Yunus
I now it is does not directly answer your question but curious, have you looked at PigRunner? Regards, Shahab On Sun, Feb 16, 2014 at 9:16 AM, Jay Vyas jayunit...@gmail.com wrote: Hi pig: What is the common idiom for executing a Java application which runs pig commands using the direct

Re: XML files and Sequencefile

2013-10-23 Thread Shahab Yunus
Have you looked at SequenceFileLoader for Pig? http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html Regards, Shahab On Wed, Oct 23, 2013 at 3:30 PM, Sameer Tilak ssti...@live.com wrote: Hi There, I have a lot of small (~0.5 MB to 3 MB) XML files

Re: Use HDFS file as input when running pig with local mode

2013-10-11 Thread Shahab Yunus
Running Pig in localmode means to use local hdfs and m/r framework. There is no separate concept of localmode for pig apart from hdfs and m/r framework on which it runs. Regards, Shahab On Fri, Oct 11, 2013 at 10:36 PM, Gordon Wang gw...@gopivotal.com wrote: Hi, Pig will not load any hadoop

Re: Convert bytearray/chararray to MAP

2013-09-30 Thread Shahab Yunus
Have you looked at the TOMAP built-in function?\ http://pig.apache.org/docs/r0.11.1/func.html#tomap You can also write your UDF if logic becomes complex or you want more control over it. Regards, Shahab On Sun, Sep 29, 2013 at 11:14 PM, sonia gehlot sonia.geh...@gmail.comwrote: Hi, I have

Re: self join in pig

2013-09-15 Thread Shahab Yunus
You need to load your data twice and then use it as any other join. Self-join is just like any other join to Pig. Regards, Shahab On Sun, Sep 15, 2013 at 1:37 PM, Raj hadoop raj.had...@gmail.com wrote: Hi, I need to subtract row2 value from row1. how to get it in Pig? Thanks Rajesh

Re: Splitting by unique values in a relation

2013-09-15 Thread Shahab Yunus
Correction in my earlier comment. The following statement that I wrote was wrong: 'Won't SPLIT always give you 2 relations?' It is basically what Praveenesh himself mentioned i.e. a pre-defined/known number of relations/splits. Regards, Shahab On Sun, Sep 15, 2013 at 7:41 PM, praveenesh kumar

Re: Sort Order in HBase with Pig/Piglatin in Java

2013-09-13 Thread Shahab Yunus
But I used a LinkedHashMap insead. Do you knows whats the better choice? TreeMap or LinkedHashMap? If you are asking from functionality perspective then there is a difference between them that LinkedHashMap maintains the order in which items were entered in the map. So if they were entered in the

Re: Multiple reduce no effect

2013-09-03 Thread Shahab Yunus
How is your key distribution in your data? There might be a chance that the 2 reducers are getting bulk of your data because of skewed key/data distribution. From the counters themselves, you can see that the 2 reducers' have much higher values than the set of 14. Regards, Shahab On Tue, Sep

Re: error in processing data in pig

2013-08-04 Thread Shahab Yunus
In the input file are the 2 sets of integers tab separated? The default delimiter is tab in PigStorage and it seems the input file might not be properly formatted? Regards, Shahab On Sun, Aug 4, 2013 at 6:37 AM, sumit piparsania sumitpiparsa...@yahoo.comwrote: Hi, I am trying to process the

Re: error in processing data in pig

2013-08-04 Thread Shahab Yunus
:28 PM, sumit piparsania sumitpiparsa...@yahoo.com wrote: hi Shahab, Yes, the 2 sets of integers are tab separated. Still getting the error. What could be reason for this error. *From:* Shahab Yunus shahab.yu...@gmail.com *To:* user@pig.apache.org; sumit piparsania sumitpiparsa

Re: Is it safe to have static methods in Hadoop Framework

2013-07-25 Thread Shahab Yunus
If each job (its child tasks) is running in its own JVM then this should not be a problem. Regards, Shahab On Thu, Jul 25, 2013 at 2:46 PM, Huy Pham pha...@yahoo-inc.com wrote: Hi All, I am writing a class (called Parser) with a couple of static functions because I don't want millions of

Re: pig 0.8.1 - Iterating contents of a Bag

2013-07-23 Thread Shahab Yunus
Amit, have you looked into TOBAG and TOTUPLE built-in UDFs? They are not helpful? Regards, Shahab On Tue, Jul 23, 2013 at 5:46 PM, Amit am...@yahoo.com wrote: Hello, Based on your suggestion worked my way around using flatten. This is what I am doing now B = FOREACH A GENERATE

Re: dereferencing bag of map

2013-06-21 Thread Shahab Yunus
Have you tried flattening the bag first? On Fri, Jun 21, 2013 at 5:43 AM, Suresh Saggar s...@bsb.in wrote: Facing a similar challenge. Here X contains one column named 'metadata' of type bytearray. But the actual content is a JSON i.e. the value of metadata field is a JSON (keys as sId cId)

Re: Javascript UDF return Empty results

2013-06-18 Thread Shahab Yunus
Take a look at these. Do they help? http://pig.apache.org/docs/r0.11.0/func.html#add-duration http://pig.apache.org/docs/r0.11.1/api/index.html?org/apache/pig/piggybank/evaluation/datetime/diff/package-tree.html

Re: JsonStorage() fails to write .pig_schema to S3 after correctly writing alias to json files.

2013-06-08 Thread Shahab Yunus
I don't have a solution but just wondering what lib are you using for JsonStorage? I hope it doesn't have any issues in it. Regards, Shahab On Sat, Jun 8, 2013 at 3:26 PM, Alan Crosswell a...@crosswell.us wrote: Hello, I've been having trouble with JsonStorage(). First, since my Python UDF

Re: pig latin regex

2013-06-04 Thread Shahab Yunus
Yes, you can. You might need to tweak some of the glob expressions. Check this: http://stackoverflow.com/questions/12630584/load-multiple-files-in-pig Regards, Shahab On Tue, Jun 4, 2013 at 4:18 AM, Pedro Sá da Costa psdc1...@gmail.comwrote: Hi, When building a query in Pig latin, we can

Re: Logging error messages in Pig

2013-06-04 Thread Shahab Yunus
Was there an error in your pig script? Aren't only error messages being logged in that log file? Regards, Shahab On Tue, Jun 4, 2013 at 11:34 AM, Raj Hadoop hadoop...@yahoo.com wrote: I am running pig in local mode. I modified the pig.properties file as well to include as

Re: Exception (possibly) due to type casting between bytearray to charray in HBaseStorage

2013-05-31 Thread Shahab Yunus
-security.jar /opt/cloudera/parcels/CDH-version/lib/zookeeper/zookeeper-version.jar On 30 May 2013 20:14, Shahab Yunus shahab.yu...@gmail.com wrote: I am not explicitly registering any of these jars in the script. The cluster was setup through standard Cloudera installation (4.2.0). Should I

Re: Exception (possibly) due to type casting between bytearray to charray in HBaseStorage

2013-05-30 Thread Shahab Yunus
the hbase and zookeepr jar files in your pig script ? On 30 May 2013 06:24, Shahab Yunus shahab.yu...@gmail.com wrote: Hello, When loading data from a HBase table in Pig, using HBaseStorage, if I specify the type of the fields as chararray, I get an exception: 2013-05-29 16:18:56,557 INFO

Re: Count empty relation after filtering

2013-05-29 Thread Shahab Yunus
Try COUNT_STAR. -Shahab On Wed, May 29, 2013 at 9:55 AM, Marco Brinkmann marco.brinkm...@cope.iowrote: Hi everybody, I have a rather simple question and scenario, but still I could not find an answer in the documention or in other resource: id, valid (1, false) (2, false) records =

Re: DBStorage incompatibility with other Storage in pig Script

2013-05-28 Thread Shahab Yunus
of pile up it into the batch and as auto-commit is true it will commit it immediately after execution. And this work for me *Dont know the pros and cons of it. * Now will try for no_multiquery option. On Tue, May 28, 2013 at 7:33 AM, Shahab Yunus shahab.yu...@gmail.com wrote

Re: DBStorage incompatibility with other Storage in pig Script

2013-05-27 Thread Shahab Yunus
@Hardik, Try to run your run script with 'no_multiquery ' option: pig -no_multiquery myscript.pig Regards, Shahab On Mon, May 27, 2013 at 8:41 AM, Hardik Shah hardik.g...@gmail.com wrote: DBStorage is not working with other storage in pig script. means DBStorage is not working with

Re: Multiple STORE statements in one script

2013-05-09 Thread Shahab Yunus
guess it may work in cases where other tools don't. And probably it can provide some more useful functionality. On Wed, May 8, 2013 at 9:47 PM, Shahab Yunus shahab.yu...@gmail.com wrote: @Peter thanks for the tip. -no_multiquery flag works but I guess it will be a performance hit

Re: Multiple STORE statements in one script

2013-05-08 Thread Shahab Yunus
, It is possible to have multiple store statements, but I can't tell why you have nothing in the result. I recommend to split the task to the appropriate tools: store everything in HDFS and then run Sqoop to upload data to an RDBMS. Ruslan On Wed, May 8, 2013 at 6:11 PM, Shahab Yunus shahab.yu

Re: does pig support loop and branching now?

2013-05-07 Thread Shahab Yunus
Nevermind Jonathan, if you also meant the same thing what Tom Wheeler just posted. Thanks :) On Tue, May 7, 2013 at 9:59 AM, Tom Wheeler tomwh...@gmail.com wrote: Neither are supported directly in Pig Latin, but it is possible to embed Pig in another language such as Java or Python to

Re: add days to the current date

2013-05-06 Thread Shahab Yunus
See http://pig.apache.org/docs/r0.11.0/func.html#datetime-functions Specially 'AddDuration'. Regards, Shahab On Mon, May 6, 2013 at 8:55 PM, soniya B soniya.bigd...@gmail.com wrote: Hi, How to add days to the current date in PIG? Is there any built in fucntion? Regards Soniya