Just to add one thing form my experience (and please feel free to correct
me if I am wrong) that even when you increase the limit, it can't go above
3000.
Regards,
Shahab
On Mon, Apr 27, 2015 at 1:21 PM, Daniel Dai da...@hortonworks.com wrote:
This is the known issue with rank implementation
Have you looked at the SPLIT operator in Pig? Does that help?
http://pig.apache.org/docs/r0.12.0/basic.html#SPLIT
Regards,
Shahab
On Tue, Jan 6, 2015 at 8:51 AM, Jumsheed jumsh...@gmail.com wrote:
Hi,
I have a file with data in below format,
A
abcdefghijklmnop
abcdefghijklmnop
Do you have access to the Yarn or MR UI? It can be done from there. 'Job
Counters Limit' is the name of the configuration there
(mapreduce.job.counters.max). This is in Yarn.
Regards,
Shahab
On Thu, Oct 16, 2014 at 5:44 PM, Rodrigo Ferreira web...@gmail.com wrote:
And how can I do that? I'm
.
The bag c contains the groups and not hte bag b.
TThanks.
On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus shahab.yu...@gmail.com
wrote:
Have you seen this documentation and blog?
http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
/ http://pig.apache.org/docs
Have you seen this documentation and blog?
http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig/
http://pig.apache.org/docs/r0.9.2/func.html#count
They explain this in detail.
Regards,
Shahab
On Mon, Jul 21, 2014 at 8:44 AM, Ashish Dobhal dobhalashish...@gmail.com
wrote:
a
as the arguement in
the COUNT(b) function.
The bag c contains the groups and not hte bag b.
TThanks.
On Mon, Jul 21, 2014 at 6:21 PM, Shahab Yunus shahab.yu...@gmail.com
wrote:
Have you seen this documentation and blog?
http://squarecog.wordpress.com/2010/05/11/group-operator-in-apache-pig
Not an exact solution but an idea.
You can use the STRSPLIT function first in Pig to parse your variable
'path'.
http://pig.apache.org/docs/r0.11.1/func.html#strsplit
Then for each split-ed path you can use the STORE command.
http://pig.apache.org/docs/r0.7.0/piglatin_ref1.html#Store+vs.+Dump
I now it is does not directly answer your question but curious, have you
looked at PigRunner?
Regards,
Shahab
On Sun, Feb 16, 2014 at 9:16 AM, Jay Vyas jayunit...@gmail.com wrote:
Hi pig:
What is the common idiom for executing a Java application which runs pig
commands using the direct
Have you looked at SequenceFileLoader for Pig?
http://pig.apache.org/docs/r0.11.0/api/org/apache/pig/piggybank/storage/SequenceFileLoader.html
Regards,
Shahab
On Wed, Oct 23, 2013 at 3:30 PM, Sameer Tilak ssti...@live.com wrote:
Hi There,
I have a lot of small (~0.5 MB to 3 MB) XML files
Running Pig in localmode means to use local hdfs and m/r framework. There
is no separate concept of localmode for pig apart from hdfs and m/r
framework on which it runs.
Regards,
Shahab
On Fri, Oct 11, 2013 at 10:36 PM, Gordon Wang gw...@gopivotal.com wrote:
Hi,
Pig will not load any hadoop
Have you looked at the TOMAP built-in function?\
http://pig.apache.org/docs/r0.11.1/func.html#tomap
You can also write your UDF if logic becomes complex or you want more
control over it.
Regards,
Shahab
On Sun, Sep 29, 2013 at 11:14 PM, sonia gehlot sonia.geh...@gmail.comwrote:
Hi,
I have
You need to load your data twice and then use it as any other join.
Self-join is just like any other join to Pig.
Regards,
Shahab
On Sun, Sep 15, 2013 at 1:37 PM, Raj hadoop raj.had...@gmail.com wrote:
Hi,
I need to subtract row2 value from row1. how to get it in Pig?
Thanks
Rajesh
Correction in my earlier comment. The following statement that I wrote was
wrong:
'Won't SPLIT always give you 2 relations?'
It is basically what Praveenesh himself mentioned i.e. a pre-defined/known
number of relations/splits.
Regards,
Shahab
On Sun, Sep 15, 2013 at 7:41 PM, praveenesh kumar
But I used
a LinkedHashMap insead. Do you knows whats the better choice? TreeMap
or LinkedHashMap?
If you are asking from functionality perspective then there is a difference
between them that LinkedHashMap maintains the order in which items were
entered in the map. So if they were entered in the
How is your key distribution in your data? There might be a chance that the
2 reducers are getting bulk of your data because of skewed key/data
distribution.
From the counters themselves, you can see that the 2 reducers' have much
higher values than the set of 14.
Regards,
Shahab
On Tue, Sep
In the input file are the 2 sets of integers tab separated? The default
delimiter is tab in PigStorage and it seems the input file might not be
properly formatted?
Regards,
Shahab
On Sun, Aug 4, 2013 at 6:37 AM, sumit piparsania
sumitpiparsa...@yahoo.comwrote:
Hi,
I am trying to process the
:28 PM, sumit piparsania sumitpiparsa...@yahoo.com
wrote:
hi Shahab,
Yes, the 2 sets of integers are tab separated. Still getting the error.
What could be reason for this error.
*From:* Shahab Yunus shahab.yu...@gmail.com
*To:* user@pig.apache.org; sumit piparsania sumitpiparsa
If each job (its child tasks) is running in its own JVM then this should
not be a problem.
Regards,
Shahab
On Thu, Jul 25, 2013 at 2:46 PM, Huy Pham pha...@yahoo-inc.com wrote:
Hi All,
I am writing a class (called Parser) with a couple of static functions
because I don't want millions of
Amit, have you looked into TOBAG and TOTUPLE built-in UDFs? They are not
helpful?
Regards,
Shahab
On Tue, Jul 23, 2013 at 5:46 PM, Amit am...@yahoo.com wrote:
Hello,
Based on your suggestion worked my way around using flatten.
This is what I am doing now
B = FOREACH A GENERATE
Have you tried flattening the bag first?
On Fri, Jun 21, 2013 at 5:43 AM, Suresh Saggar s...@bsb.in wrote:
Facing a similar challenge. Here X contains one column named 'metadata' of
type bytearray. But the actual content is a JSON i.e. the value of metadata
field is a JSON (keys as sId cId)
Take a look at these. Do they help?
http://pig.apache.org/docs/r0.11.0/func.html#add-duration
http://pig.apache.org/docs/r0.11.1/api/index.html?org/apache/pig/piggybank/evaluation/datetime/diff/package-tree.html
I don't have a solution but just wondering what lib are you using for
JsonStorage? I hope it doesn't have any issues in it.
Regards,
Shahab
On Sat, Jun 8, 2013 at 3:26 PM, Alan Crosswell a...@crosswell.us wrote:
Hello,
I've been having trouble with JsonStorage(). First, since my Python UDF
Yes, you can. You might need to tweak some of the glob expressions.
Check this:
http://stackoverflow.com/questions/12630584/load-multiple-files-in-pig
Regards,
Shahab
On Tue, Jun 4, 2013 at 4:18 AM, Pedro Sá da Costa psdc1...@gmail.comwrote:
Hi,
When building a query in Pig latin, we can
Was there an error in your pig script? Aren't only error messages being
logged in that log file?
Regards,
Shahab
On Tue, Jun 4, 2013 at 11:34 AM, Raj Hadoop hadoop...@yahoo.com wrote:
I am running pig in local mode.
I modified the pig.properties file as well to include as
-security.jar
/opt/cloudera/parcels/CDH-version/lib/zookeeper/zookeeper-version.jar
On 30 May 2013 20:14, Shahab Yunus shahab.yu...@gmail.com wrote:
I am not explicitly registering any of these jars in the script. The
cluster was setup through standard Cloudera installation (4.2.0). Should
I
the hbase and zookeepr jar files in your pig script ?
On 30 May 2013 06:24, Shahab Yunus shahab.yu...@gmail.com wrote:
Hello,
When loading data from a HBase table in Pig, using HBaseStorage, if I
specify the type of the fields as chararray, I get an exception:
2013-05-29 16:18:56,557 INFO
Try COUNT_STAR.
-Shahab
On Wed, May 29, 2013 at 9:55 AM, Marco Brinkmann marco.brinkm...@cope.iowrote:
Hi everybody,
I have a rather simple question and scenario, but still I could not find an
answer in the documention or in other resource:
id, valid
(1, false)
(2, false)
records =
of pile up it into
the batch and as auto-commit is true it will commit it immediately after
execution.
And this work for me *Dont know the pros and cons of it. *
Now will try for no_multiquery option.
On Tue, May 28, 2013 at 7:33 AM, Shahab Yunus shahab.yu...@gmail.com
wrote
@Hardik,
Try to run your run script with 'no_multiquery ' option:
pig -no_multiquery myscript.pig
Regards,
Shahab
On Mon, May 27, 2013 at 8:41 AM, Hardik Shah hardik.g...@gmail.com wrote:
DBStorage is not working with other storage in pig script. means DBStorage
is not working with
guess it may work in cases where other tools don't. And
probably it can provide some more useful functionality.
On Wed, May 8, 2013 at 9:47 PM, Shahab Yunus shahab.yu...@gmail.com
wrote:
@Peter thanks for the tip. -no_multiquery flag works but I guess it
will be
a performance hit
,
It is possible to have multiple store statements, but I can't tell why you
have nothing in the result.
I recommend to split the task to the appropriate tools: store everything in
HDFS and then run Sqoop to upload data to an RDBMS.
Ruslan
On Wed, May 8, 2013 at 6:11 PM, Shahab Yunus shahab.yu
Nevermind Jonathan, if you also meant the same thing what Tom Wheeler just
posted. Thanks :)
On Tue, May 7, 2013 at 9:59 AM, Tom Wheeler tomwh...@gmail.com wrote:
Neither are supported directly in Pig Latin, but it is possible to embed
Pig in another language such as Java or Python to
See http://pig.apache.org/docs/r0.11.0/func.html#datetime-functions
Specially 'AddDuration'.
Regards,
Shahab
On Mon, May 6, 2013 at 8:55 PM, soniya B soniya.bigd...@gmail.com wrote:
Hi,
How to add days to the current date in PIG? Is there any built in fucntion?
Regards
Soniya
33 matches
Mail list logo