This question should be sent to u...@hive.apache.org.
Alan.
On Jul 16, 2013, at 3:23 AM, samir das mohapatra wrote:
Dear All,
Did any one faced the issue :
While Loading huge dataset into hive table , hive restricting me to query
from same table.
I have set
http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html
search on cross matches
Alan.
On Sep 1, 2011, at 11:44 AM, Marc Sturlese wrote:
Hey there,
I would like to do the cross product of two data sets, any of them feeds in
memory. I've seen pig has the cross operation. Can
When I download the Pig 0.8.1 tarball I don't find any junit class files, just
a license file (which probably doesn't need to be there). If you build it it
will pull those via Ivy, but I they are not in the tarball.
AFAIK it will work with any Junit 4.x, but 4.5 is what we use in our testing.
to
move as quickly as possible. Is this strong enough for you Ben?
Alan.
On Sep 27, 2010, at 6:18 PM, Alan Gates wrote:
As directed in our vote to become a TLP, we (Pig's PMC) need to set
out bylaws for the project. I have put up a first proposal for these
by laws at http://wiki.apache.org/pig
We keep tabs on projects we have worked on, are working on, and are
thinking of working on at http://wiki.apache.org/pig/PigJournal This
should give you some ideas for projects.
Alan.
On Sep 28, 2010, at 11:38 AM, yoomeosym...@yahoo.com wrote:
Kindly give a set of project on the above,
Are you loading them as tuples or maps? If you're loading them as
tuples than you should be able to say x.keyA.pA (which should return
vA). If you're loading them as maps than it would be x#'keyA'#'pA'
Alan.
On Sep 28, 2010, at 12:45 PM, rakesh kothari wrote:
Hi,
Is there a good way
-Original Message-
From: Alan Gates [mailto:ga...@yahoo-inc.com]
Sent: Monday, September 27, 2010 6:18 PM
To: pig-user@hadoop.apache.org
Subject: [DISCUSS] Apache Pig bylaws
As directed in our vote to become a TLP, we (Pig's PMC) need to set
out bylaws for the project. I have put up a first
Pig puts results between MR jobs into HDFS. Results from maps go into
local files (like any other MR job).
For results between MR jobs, you want them in HDFS where they will get
replicated. Else your next MR job will not have a sufficient number
of places it could be run, and you're much
On Sep 8, 2010, at 7:40 AM, Aditya Muralidharan wrote:
Hi,
Thanks for your great work on pig. I've been trying to use the code
from
pig 0.7.0, and the pig 0.8.0 branch to submit jobs to a hadoop 0.21.0
cluster. Submissions don't seem to work due to API
incompatibilities. I
found issue
Luan,
Pig keeps a list at http://wiki.apache.org/pig/PigJournal of all the
Pig projects we know of. Many of these are more project based, but
some could be turned into actual research. If you do choose one of
these, please let us know (over on pig-...@hadoop.apache.org) so we
can mark
On Aug 28, 2010, at 11:39 AM, Milind A Bhandarkar wrote:
+1 on the direction.
A few questions:
1. With Pig marching towards becoming a TLP at Apache, can Piggybank
become a full-fledged subproject (with it's own releases and all) ?
2. Or since the ultimate goal is to have a common UDF
Pig 0.7 runs on 20.x.
Alan.
On Aug 27, 2010, at 2:58 PM, Saurav Datta wrote:
Thanks Alan !
Will Pig 0.7.0 run on Hadoop 0.20.x ?
Or should we use any other Hadoop release ?
Regards,
Saurav
On Aug 27, 2010, at 2:50 PM, Alan Gates wrote:
Pig has not been tested with Hadoop 0.21, so I
With 15 +1 votes (14 from PMC members) the proposal passes. Thanks
for voting.
Owen, please push this to the Apache board for their consideration.
Alan.
On Aug 23, 2010, at 10:38 AM, Alan Gates wrote:
I propose that Pig become a top level Apache project.
The Pig development community has
Pig itself does not contain image processing primitives. But if you
write your image processing in a UDF, then Pig can be a great
framework for dealing with the parallelism, running it on Hadoop, etc.
Alan.
On Jul 26, 2010, at 11:56 AM, Ifeanyichukwu Osuji wrote:
Hi all,
At this point HBaseStorage is only a load function and not a store
function. If you're interested in taking it on, we'd love to have
someone extend it to be a store function as well.
Alan.
On Jul 22, 2010, at 2:05 PM, preethi vinayak sunny wrote:
Hi All,
This is my first mail in the
Pig has implemented map side merge joins in this way. If the storage
mechanism contains an index (e.g. Zebra) it can use it.
Alan.
On Jul 21, 2010, at 5:22 PM, Deem, Mike wrote:
We are planning to use Hadoop to run a number of recurring jobs that
involve map side joins.
Rather than
Here at Yahoo we use Oozie for managing large workflows (latest open
source edition at http://github.com/tucu00/oozie1 though they expect
to make another drop before the Hadoop summit). There are plans to
make Oozie a full open source project (instead of just making drops to
github).
On Jun 22, 2010, at 1:06 PM, Dmitriy Ryaboy wrote:
I think everyone has some sort of an ad-hoc system for building and
managing
these types of things. Seems like a prime candidate for some community
development -- we would all benefit from sharing a framework like
that, and
it should be
On Jun 22, 2010, at 1:06 PM, Dmitriy Ryaboy wrote:
I think everyone has some sort of an ad-hoc system for building and
managing
these types of things. Seems like a prime candidate for some community
development -- we would all benefit from sharing a framework like
that, and
it should be
--
** addJobConf() is public, but not expected to be used by end-
users,
right? Several public methods here look like they need better
documentation, and the class itself could use a javadoc entry with
some
example uses.
On May 24, 2010, at 11:06 AM, Alan Gates wrote:
Scott,
I made an effort
Begin forwarded message:
From: Milind A Bhandarkar mili...@yahoo-inc.com
Date: May 31, 2010 9:16:38 PM PDT
To: common-u...@hadoop.apache.org common-u...@hadoop.apache.org,
mapreduce-u...@hadoop.apache.org mapreduce-
u...@hadoop.apache.org, gene...@hadoop.apache.org
Ancient history. Hadoop started as a subproject of Lucene.
Alan.
On Jun 17, 2010, at 10:22 PM, Otis Gospodnetic wrote:
Hello,
I've noticed people send emails to the following address:
hadoop-u...@lucene.apache.org
Why?
Is this supposed to be related to common-user@hadoop.apache.org
great if
C = JOIN A by id, B b id;
is alias for
C1 = COGROUP A by id, B by id;
C2 = filter C1 by IsEmpty(A) OR IsEmpty(B);
C = foreach C2 generate FLATTEN(A), FLATTEN(B);
On Tue, Jun 8, 2010 at 12:03 PM, Alan Gates ga...@yahoo-inc.com
wrote:
Historically
C = JOIN A by a, B by a
was defined
That language is an instrument of human reason, and not merely a
medium for the expression of thought, is a truth generally admitted.
- George Boole, quoted in Iverson's Turing Award Lecture
- Original Message
From: Alan Gates ga...@yahoo-inc.com
To: pig-user@hadoop.apache.org
Sent
Begin forwarded message:
From: Giuseppe Maxia g.ma...@gmail.com
Date: May 31, 2010 5:44:08 AM PDT
To: databases-disc...@opensolaris.org, derby-u...@db.apache.org,
firebird-de...@lists.sourceforge.net, gene...@hadoop.apache.org,
hbase-u...@hadoop.apache.org,
In general mapside cogroups are not possible unless the underlying
storage mechanism can guarantee that all instances of a the key you
are cogrouping on are in a single map instance. At this point only
Zebra can guarantee that. If you're interested I can give more
details on why join
At the Bay Area HUG on Wednesday someone (Eli I think, though I might
be remembering incorrectly) asked if there was a migration guide for
moving Pig load and store functions from 0.6 to 0.7. I said there was
but I couldn't remember if it had been posted yet or not. In fact it
had
No. Pig versions 0.5 and later are only compatible with Hadoop 0.20.
Alan.
On May 18, 2010, at 4:22 PM, Brian Donaldson wrote:
I get this error message:
2010-05-18 16:20:30,490 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting
to hadoop file system
The Travel Assistance Committee is now taking in applications for those
wanting to attend ApacheCon North America (NA) 2010, which is taking
place
between the 1st and 5th November in Atlanta.
The Travel Assistance Committee is looking for people who would like
to be
able to attend
On May 14, 2010, at 12:20 AM, Russell Jurney wrote:
Should I make a JIRA then submit the patch?
Yes.
Alan.
Check out the PigUnit patch in https://issues.apache.org/jira/browse/PIG-1404
and see if that will meet your needs.
Alan.
On Apr 29, 2010, at 9:28 AM, Corbin Hoenes wrote:
I see MiniCluster.java in the pig source code and want to do
something similar in my own tests tried just coping
Can't we just change the built-in CONCAT to accept additional fields?
This would be totally backward compatible. I know it won't help now.
Alan.
On May 12, 2010, at 4:15 PM, Russell Jurney wrote:
The CONCAT in the oink project (LinkedIn's UDFs) does concatenation of
any number of string
On May 10, 2010, at 5:13 PM, Syed Wasti wrote:
I keep seeing this warning message while running my scripts, is this a
concern ? Any info please. How can I get rid of this ?
It is not a concern. There's no way for you to remove it. It's
caused by code in Hadoop complaining at the way Pig
You need to change your group to a cogroup so that both bags are in
your data stream. If you don't want to group bag b by the same keys
as a (that is, you want all of b available for each group of a) then
you can load b as a side file inside your udf.
Alan.
On Apr 30, 2010, at 4:32 AM,
What is the type of the field you are trying to take the average of?
Alan.
On Apr 29, 2010, at 11:10 AM, Katukuri, Jay wrote:
Hi,
I have encountered the following error in using pig's built in
function AVG.
ERROR 1045: Could not infer the matching function for
org.apache.pig.builtin.AVG
Use the PigServer interface from Java. This way your Pig Latin and
Java can be intermixed. You will be guaranteed that your middle Java
code will start immediately after PigServer finishes running the first
Pig Latin script.
Alan.
On Apr 26, 2010, at 7:41 PM, Katukuri, Jay wrote:
PigStorage doesn't have an escaping mechanism at the moment. You
could create a load function that extends PigStorage and adds escaping
for field delimiters.
Alan.
On Apr 23, 2010, at 7:28 PM, Toli Kuznets wrote:
Hi,
I'm trying to read in a comma-separated file with a simple command:
a
Unique identifiers are easy enough. Row ids (monotonically increasing
values) are impossible because of the parallel nature of map reduce.
If you just want to generate a unique identifier you can write a UDF
to wrap Java's UUID class (or use the new GenericInvoker UDF if you're
working
another block. This way I
can get
a guaranteed unique ID. (And it's probably faster and smaller this
way than
generating UUID)
Does pig use zookeeper to do anything? Can I connect to that one if
it does?
On Fri, Apr 23, 2010 at 12:08 PM, Alan Gates ga...@yahoo-inc.com
wrote:
Unique
No. It might be useful though. AFAIK no one monitors #pig.
Alan.
Unrelated question. Does PIG have an IRC on freenode. #pig seems to be
invite only.
The grouping package in piggybank is left over from back when Pig
allowed users to define grouping functions (0.1). Functions like
these should go in evaluation.util.
However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem
Pig 0.6 works on Hadoop 0.20.x. There was never an official release
of Pig with Hadoop 0.19. Pig 0.4 was the last release to work on
0.18. There is a patch to convert this to 0.19 (see https://issues.apache.org/jira/browse/PIG-573)
. Since Pig uses the new map reduce APIs in 0.20 a
Take a look at https://issues.apache.org/jira/browse/PIG-200, the
perf-0.6.patch contains scripts to generate skewed and unskewed data.
Alan.
On Apr 15, 2010, at 5:16 PM, Radhika Parvathaneni wrote:
Dear Pig users,
Please assist in obtatining 2 skewed data sets and 2 non-skewed
I don't think this fits split's semantics.
split does not necessarily send a tuple to only one destination. So
for a split clause like:
split A into big if size 100, into really_big if size 1000
big would contain all the records that really_big contains and all
records with size = 100
On Apr 13, 2010, at 3:54 PM, Katukuri, Jay wrote:
Hello ,
I have few questions about the out-of-memory issues that I am
running into. If you could please answer them, that will be great.
I am using Pig0.40 on hadoop 0.18.3 in map reduce mode.
The data set is fairly huge (The whole data
I added a link to this on http://wiki.apache.org/pig/PigTools
Alan.
On Mar 29, 2010, at 2:51 PM, Dmitriy Ryaboy wrote:
Hi folks,
We (but mostly Kevin Weil) just open-sourced some of the code we use
at
Twitter to make working with Hadoop and Pig easier. Most of what is
currently included in
What you gave seems like it should work. But I'd try it as:
C = COGROUP A BY id, B BY id;
D = FILTER C BY COUNT(A) = 0;
E = FOREACH D GENERATE FLATTEN(B);
Alan.
On Mar 29, 2010, at 7:06 PM, Kent Shi wrote:
Hi,
I am trying to get the elements of B not in A. My code is like this
C = JOIN A
Since 0.5 Pig has run against Hadoop 0.20, and since 0.6 it has used
the new Hadoop APIs (available only in 20+). Reverting this would be
very difficult. There is a patch for Pig 0.4 that will make it run
against Hadoop 19 (https://issues.apache.org/jira/browse/PIG-573).
Alan.
On Mar
The UDF interface does not currently include the ability for a UDF to
indicate additional jars it would like to have packaged and sent along.
Alan.
On Mar 10, 2010, at 2:21 AM, Tamir Kamara wrote:
Hi,
Register is working fine but it means that the user needs to know
when it's
needed to
Which version of Pig is this? If it's trunk, then you should check
that check that you can run Hadoop on your machine, as it appears it
is not connecting to Hadoop. (As of version 0.7 Pig uses a local
instance of Hadoop in local mode.)
Alan.
On Mar 9, 2010, at 8:26 AM, Pavel Gutin
On Mar 12, 2010, at 10:36 AM, hc busy wrote:
Is there any work towards something like C languages '#include' in
Pig? My
large pig script is actually developed separately in several smaller
pig
files. Individually the pig files do not run because they depend on
previous
scripts, but
..
-D
On Mon, Mar 15, 2010 at 2:23 PM, Alan Gates ga...@yahoo-inc.com
wrote:
On Mar 12, 2010, at 10:36 AM, hc busy wrote:
Is there any work towards something like C languages '#include' in
Pig? My
large pig script is actually developed separately in several
smaller pig
files
On Mar 4, 2010, at 10:19 AM, Dmitriy Ryaboy wrote:
Thanks to Gerrit and Bill who responded.
Unfortunately they said the exact opposite thing so we are still at an
impasse :-). Anyone else care to venture an opinion?
Cause if Alan and I have a commiter fight, he'll win and y'all will
have to
Pig 0.6.0 is released. This release includes performance and memory
usage improvements, a new Accumulator interface for UDFs, and many bug
fixes. You can see the details of the release at http://hadoop.apache.org/pig/releases.html
Alan.
PigStorage (the loader you are using) creates all values as
bytearrays, which in Java is represented as a DataByteArray. So when
you get the id element of your map, it is a DataByteArray.
If all you really want to do is cast from bytearray to a long you
don't need a UDF for that.
On Feb 16, 2010, at 4:02 PM, Kelvin Moss wrote:
Thanks for the reply. Actually I have more than 10 keys in my map. I
tried the following in UDF and it seems to work
Long id;
if (m.get(id) != null) {
id = Long.parseLong(m.get(id).toString());
}
This is correct
Done.
Alan.
On Feb 10, 2010, at 7:21 PM, Lars Francke wrote:
Hi,
I have a (hopefully) small request regarding JIRA. I quite like the
Road Map feature[1] but unfortunately it doesn't work correctly for
Pig as all versions (except 0.0.0) are set to Unreleased[2]. Would
anyone with the power to
You are not wrong. This is a feature we'd like to add but haven't
gotten to yet.
Alan.
On Feb 9, 2010, at 8:12 PM, prasenjit mukherjee wrote:
May be I was not clear enough on my problem. I would like to call
another pig-script from a pig-script. How can I do that.
As far as I understand,
5, 2010 at 2:46 PM, Alan Gates ga...@yahoo-inc.com
wrote:
Putting the jars on your classpath works as long as the classes
you need
are directly referenced in your script. So:
B = foreach A generate com.mycompany.myudf($0);
If myudf is in a jar somewhere in your classpath
This is a bug. Looking at the code EqualTo isn't implemented for
Tuple, even though it is defined in the functional spec ( http://wiki.apache.org/pig/PigTypesFunctionalSpec
) and referenced in the user manual. Please file a JIRA on this so
we can track it and get it fixed. In the
Answers inlined:
On Feb 2, 2010, at 3:15 AM, Guy Jeffery wrote:
Hi,
Hope this gets to the right list...
I'm fairly new to Pig, been playing around with it for a couple of
days.
Essentially I'm doing a bit of work to evaluate Pig and its ability to
simplify the use of Hadoop -
Before building in piggybank you need to do 'ant jar compile-test' at
the top level. From the error messages I'm guessing you didn't do that.
Alan.
On Jan 26, 2010, at 10:53 PM, felix gao wrote:
Hi all,
Just downloaded it and when following the instruction to build there
is
compilation
PIG_CLASSPATH=your_config_directory pig
Alan.
On Jan 27, 2010, at 11:54 AM, Aryeh Berkowitz wrote:
When I run Pig, I connect to the local file system, when I run (java
-cp pig-0.5.0-core.jar:$HADOOP_HOME/conf org.apache.pig.Main) I
connect to hdfs. It seems like Pig is not finding my
:
Hi Alan, I'm not quite sure what you mean. As shown in my pig
script, I have
stated to have 56 reducers for the group by task. And the number of
mappers is decided by hadoop. Is there any way to optimize my pig
script
further?
On 20 Jan 2010 19:07, Alan Gates ga...@yahoo-inc.com wrote
Are you setting parallel as Mirdul suggests? Or does your cluster
have a default parallelism set?
Alan.
On Jan 20, 2010, at 1:58 AM, Rob Stewart wrote:
Hi again,
The results have been produced. I can tell you that I made the
following
improvements:
1. Removed unnecessary words =
..:-)
Cheers,
/R
On 1/20/10 1:41 AM, Alan Gates ga...@yahoo-inc.com wrote:
Let me elaborate on what Rekha said. He's correct that Pig does it
automatically for order by. It has to sample the input to the order
by to decide how to distribute the keys. As part of this is notices
any skew and spreads
Mat,
This looks really nice. Are you okay with me posting it at http://wiki.apache.org/pig/PigTalksPapers
so other Pig users can benefit from it?
Alan.
On Jan 16, 2010, at 8:20 PM, Mat Kelcey wrote:
based on a talk i gave at work recently
hope it might help someone as an intro to pig
mat
Jeremy,
Usually the mails get bounced when the sender isn't a subscriber to
pig-user.
Usually we see this sit and wait behavior when other jobs are running
and there are no slots open on the cluster. If you see this behavior
again can you look at the job tracker GUI. It will tell you
Qui tacet consenti
No one has spoken up, so I think you're free to make the change.
Alan.
On Jan 6, 2010, at 8:14 AM, Jeff Zhang wrote:
Hi all,
I am currently working on a JIRA which will change the interface of
Tuple
and DataBag: PIG-1166 https://issues.apache.org/jira/browse/PIG-1166
That's correct. See Ying's comments near the bottom on how to make
the patches there work together.
Alan.
On Jan 15, 2010, at 6:30 PM, Matei Zaharia wrote:
Hi,
I'm interested in running the PigMix benchmark described at http://wiki.apache.org/pig/PigMix
to test some scheduling work in
Done.
Alan.
On Jan 13, 2010, at 10:01 PM, Theo Hultberg wrote:
Please do!
T#
On Thu, Jan 14, 2010 at 12:02 AM, Alan Gates ga...@yahoo-inc.com
wrote:
Theo,
This looks really interesting. Can I put a link to it on our page
for tools
use with Pig, http://wiki.apache.org/pig/PigTools
Rob,
Feel free to update the wiki with your findings. You don't have to be
a committer to change the wiki.
Alan.
On Jan 14, 2010, at 12:15 PM, Rob Stewart wrote:
Hello Dmitry!
I have it solved, it was just a bit of trial and error based on the
Hive bug
report/fix I found.
The report
I'm guessing that you want to set the width of the text to avoid the
issue where if you split by block, then all splits but the first will
have an unknown offset.
Most texts have natural divisions in them which I'm guessing you'll
want to respect anyway. In the Bible this would be the
The script you give below will run twice, once for the dump, and once
for the store. And dump is implemented as store plus cat. So I don't
think this will do what you want.
Alan.
On Dec 18, 2009, at 1:48 AM, prasenjit mukherjee wrote:
I am trying to figure out a way to identify the
In MR mode, the output of your UDFs will turn up in the logs of the
map and reduce tasks, not in the pig log. There is currently no
channel for pig to send log messages back from the cluster to your
machine to put the messages in the pig log.
Alan.
On Jan 5, 2010, at 7:02 AM, Vincent
Definitely.
Alan.
On Dec 8, 2009, at 3:12 PM, James Kebinger wrote:
Hi all, I realized a week or two ago that PigStorage(',') wasn't
adequate to
parse files that had commas embedded in properly CSV quoted fields.
I went ahead and built a CSV parser for pig 0.3 that deals with
embedded
On Nov 26, 2009, at 7:39 AM, Jeff Zhang wrote:
Hi all,
I'd like to know where's the name zebra come from ? does it convey the
meaning of this meta data system that the columnar storage format is
like
the lines on the zebra's skin.
Pretty much, yes. We've fallen into the habit of giving
Do you want to keep the distinct values separate by input, or mingle
them? The following script will keep them separate.
A = load 'students' as (name);
B = load 'employees' as (name);
C = cogroup A by name, B by name;
D = filter C by IsEmpty(A);
E = foreach D generate flatten(B);
store E into
On Nov 25, 2009, at 2:59 PM, Dmitriy Ryaboy wrote:
snip
This is a good use case that manages to expose a with the UDF apis
-- it
would be nice to output multiple records per processed tuple in
exec(), to
allow the kind of processing actual Pig operators sometimes do, with
buffering
HbaseStorage is broken in Pig 0.5.0, see https://issues.apache.org/jira/browse/PIG-970
The fix for that has been checked into trunk. You can either check
out from trunk and build to get that, or can check out from the 0.5.0
branch and then apply the patches in PIG-970 to that code base.
All,
Yahoo has a number of Hadoop development positions open. There are
engineering, architect, management, and QA positions all open. See http://developer.yahoo.net/blogs/hadoop/2009/11/updated_do_you_have_what_it_ta.html
for details.
Alan.
On Nov 12, 2009, at 2:49 PM, Scott Carey wrote:
Is it possible to have a script at least use the default configured
Hadoop value? Or is there a way to do that already?
If the user doesn't specify a parallelism Pig doesn't set a value in
JobConf for the reduce, which means it will pick up
Looks like it is missing from the distribution. You can see the file at
http://svn.apache.org/viewvc/hadoop/pig/branches/branch-0.5/conf/pig.properties?revision=815933view=markup
You can also get it from svn.
Alan.
On Nov 13, 2009, at 9:01 AM, John Hayward wrote:
I downloaded hadoop 0.20.1
Filed https://issues.apache.org/jira/browse/PIG-1093 for this issue.
Alan.
On Nov 13, 2009, at 10:26 AM, Alan Gates wrote:
Looks like it is missing from the distribution. You can see the
file at
http://svn.apache.org/viewvc/hadoop/pig/branches/branch-0.5/conf/pig.properties?revision
I agree that it would be very useful to have a dynamic number of
reducers. However, I'm not sure how to accomplish it. MapReduce
requires that we set the number of reducers up front in JobConf, when
we submit the job. But we don't know the number of maps until
getSplits is called after
to the jobtracker.
No, it's a copy. Changes made in it don't end up affecting the job.
Alan.
ben
Alan Gates wrote:
I agree that it would be very useful to have a dynamic number of
reducers. However, I'm not sure how to accomplish it. MapReduce
requires that we set the number of reducers up
On Nov 8, 2009, at 7:08 AM, Rob Stewart wrote:
snip
So, Alan, you're correct, MapReduce, on its own does not provide me
with
loops, I have to wrap a loop around this MapReduce method
getAllChildren()
to get all children of john. When you say that I would have to wrap
Java
around Pig to
I'm not sure I understand your question, but it sounds like you want
to comingle data from two relations, X and Y without doing a join or
cross. Is that correct? If so, you can't do that. If you have a
script like:
X = load 'file_data';
Y = load 'tuple_data';
Z = do something with X and
On Oct 31, 2009, at 11:22 AM, Rob Stewart wrote:
snip
Map and reduce parallelism are controlled differently in Hadoop. Map
parallelism is controlled by the InputSplit. IS determines how
many maps to
start and which file blocks to assign to which maps. In the case
of PigMix,
both the MR
Check out LOWER in piggybank.
Alan.
On Oct 21, 2009, at 8:32 AM, Vincent Barat wrote:
Hello,
Quick question: is there a set of ready to use PIG UDFs functions ?
I'm looking to TOLOWERCASE function...
Cheers,
What you propose below will result in all of the records for a given
group going to a single instance of sessions.pl.
Alan.
On Oct 13, 2009, at 4:04 PM, Paul B wrote:
I'm setting up a pig job that needs to stream a grouped set of data
to an
instance of a perl script. I need to ensure
It's dying when trying to write out the contents of the tuples that
are in the bag. What is the schema of the tuples inside the bag?
Alan.
On Oct 1, 2009, at 8:37 PM, miryala vignesh wrote:
Hie,
I was implementing LoadFunc Interface , in that getNext() returns a
tuple
. I have a bag
/TableStorer.java in Pig's contrib directory.
Alan.
On Sep 9, 2009, at 6:20 PM, Liu Xianglong wrote:
Hi, Alan. I am interest in this store function, could you mind
sending me some details?
--
From: Alan Gates ga...@yahoo-inc.com
Sent: Thursday
Kevin,
Please take a look at the proposal for reworking load and store
functions that was posted a couple of days ago and see if it will
address your issues with plugability of load functions.
http://wiki.apache.org/pig/LoadStoreRedesignProposal
Alan.
On Sep 14, 2009, at 8:58 AM, Kevin
I don't know much about Nutch, or its format. If it is not a text
format separated by some single character value (such as comma, tab,
etc.) you'll need to write a load function to read it and parse it
into Pig tuples. You can find more info on writing load functions at
Our plan is to switch Pig to Hadoop 0.20 as soon as Hadoop 0.20.1 is
released, because there's some features in that release we would like
to have. Last I knew 0.20.1 was in the vote phase to be released.
Integration with hbase 0.20 will need someone to pick it up and work
on it. I am
Pig uses jline to do command line editing in grunt, so it supports
whatever jline supports.
Alan.
On Sep 11, 2009, at 7:46 AM, Vincent BARAT wrote:
Ashutosh Chauhan a écrit :
Do you mean using ^D to kill grunt and return to OS shell ?
No, just ^d to delete the character just after the
I do not know if there is a general hbase load/import tool. That
would be a good question for the hbase-user list.
Right now Pig does not have a store function to write data into
hbase. It is possible to write such a function. If you are
interested I can send you specific details on how
How large are the records in your file? Do you expect any single
record to be in the multi-megabyte size?
Have you tried decompressing the file and reading it to see if the
issue is the compression?
Alan.
On Sep 8, 2009, at 7:58 AM, Irfan Mohammed wrote:
Hi,
I am trying to load a large
In other mails you're using Pig's multi-query feature to group the
same data different ways. Is that the same thing you're wanting to do
here, or something different?
Alan.
On Sep 3, 2009, at 1:08 PM, zaki rahaman wrote:
I have a set of logfiles that I'm parsing and analyzing using Pig in
1 - 100 of 147 matches
Mail list logo