you have the wrong hadoop on your classpath, or you did not recompile
against hadoop 2.
On Thu, Dec 12, 2013 at 12:18 PM, qiaoresearcher
wrote:
> Everything was fine with hadoop 1.x and pig 0.11.
> Recently I installed hadoop 2.2.0 and pig 0.12.0, and run some simple one
> line script : load som
on from any Java object type in the
> sequence file to pig types. See
> https://issues.apache.org/jira/browse/PIG-1777
>
> On Tue, Sep 24, 2013 at 5:22 AM, Dmitriy Ryaboy
> wrote:
> > I assume by scala you mean scalding?
> > If so, yeah, scalding should be much easier
I assume by scala you mean scalding?
If so, yeah, scalding should be much easier for working with custom data
types.
Pig doesn't handle generic "objects" well. You have to write converters to
and from, like the ones we created in ElephantBird for Protocol Buffers and
Thrift (and a bunch of writabl
That's actually the documented behavior:
https://pig.apache.org/docs/r0.10.0/func.html#count
There was some discussion about changing this:
https://issues.apache.org/jira/browse/PIG-1014
Patches gratefully accepted..
D
On Sat, Sep 14, 2013 at 12:01 AM, centerqi hu wrote:
> The sample.txt fil
Loaders and UDFs are all initialized at the compilation phase, so you can't
pass dynamically calculated values in (you can do some things by
pre-calculating constants like current time, etc, using variable binding
via the define keyword, but you are trying to do something far more fancy).
Moreover
Don't use CombinedFile InputFormat / Record Reader. Just let Pig do its
thing.
On Wed, Sep 18, 2013 at 9:08 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> I tried this
> http://pig.apache.org/docs/r0.8.1/cookbook.html#Combine+Small+Input+Files
>
> Test Job Details
> Input 7 Files * 51MB each
>
> HDFS Counters of
Also, if you run the pig job from a script rather than from the grunt shell,
the name is the name of the script (so, "pig foo.pig" names spawned jobs
foo.pig)
On Apr 15, 2013, at 5:50 PM, Bill Graham wrote:
> You can do this in your script as well:
>
> SET job.name 'my job';
>
>
>
>
> On
is a matrix including vertex id and its starting position
> > > >
> > > >
> > > > graph = load 'graph' using PigStorage() (vertex:int, follower:int) -
> > > > --load the graph file
> > > > vertex = COGROUP graph BY (vertex);
> > > &
#x27; USING
> com.twitter.elephantbird.pig.load.JsonLoader() as (json:map[]);
> DUMP inputData;
>
>
> On Thu, Sep 27, 2012 at 8:48 AM, Dmitriy Ryaboy
> wrote:
>
> > Yep. It's just JsonLoader.
> > By default it works on top of whatever's returned by
Do you have any special properties set?
Like the pig.udf.profile one maybe..
D
On Thu, Apr 4, 2013 at 6:25 AM, Lauren Blau <
lauren.b...@digitalreasoning.com> wrote:
> I'm running a simple script to add a sequence_number to a relation, sort
> the result and store to a file:
>
> a0 = load '' usin
it's clear :)
>
> Thanks
> Best Regards...
>
>
> On Fri, Mar 29, 2013 at 6:10 PM, Dmitriy Ryaboy
> wrote:
>
> > Hi Burakk,
> > The general idea of making graph processing easier is a good one. I'm not
> > sure what exactly you are proposing to d
Hi Burakk,
The general idea of making graph processing easier is a good one. I'm not
sure what exactly you are proposing to do, though. Could you be more
detailed about what you are thinking?
On Thu, Mar 28, 2013 at 1:28 PM, burakkk wrote:
> Hi,
> I might be a little bit late. I come up with a
Mike, have you tried adding logging to any EvalFunc methods that
communicate with Mongo to see which of them is calling it after finish() ?
Are you sure something else doesn't close Mongo connection for you?
On Fri, Mar 22, 2013 at 8:28 AM, Mike Sukmanowsky wrote:
> Bump - any thoughts?
>
>
> O
To explain what's going on:
-limit for HBaseStorage limits the number of rows returned from *each
region* in the hbase table. It's an optimization -- there is no way for the
LIMIT operator to be pushed down to the loader, so you can do it explicitly
if you know you only need a few rows and don't wa
jar)
> and put it in my run directory, but still got the same error message. My
> CLASSPATH is
>
> CLASSPATH=.:/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64/lib/dt.jar:.
>
> So it should look in the current directory, right?
>
> Thanks
> Dan
>
> -
11.0 is currently required.
On Tue, Mar 12, 2013 at 2:54 PM, Danfeng Li wrote:
> Thanks for the quick repsonse, which guava version I should use?
>
> -Original Message-
> From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
> Sent: Tuesday, March 12, 2013 2:52 PM
> To: u
g bug
in S3 this thing tickles isn't being triggered).
On Tue, Mar 12, 2013 at 2:55 PM, Dmitriy Ryaboy wrote:
> Sounds like a bug in the S3 implementation of FileSystem? Does this happen
> with pig 0.10 or 0.11?
>
>
>
> On Mon, Mar 11, 2013 at 12:11 AM, Yang wrote:
>
Sounds like a bug in the S3 implementation of FileSystem? Does this happen
with pig 0.10 or 0.11?
On Mon, Mar 11, 2013 at 12:11 AM, Yang wrote:
> the following code gave null pointer exception
>
>
> ---
>
> rbl
How does LazySimpleSerde store data?
On Tue, Mar 12, 2013 at 11:17 AM, Shawn Hermans wrote:
> All,
> Is there an easy way to read Hive LazySimpleSerde encoded files in Pig? I
> did some research and found support for Hive's columnar format and for
> SequenceFiles, but did not see anything for L
Sounds like you have a bad (older? newer?) version of guava on the
classpath.
On Tue, Mar 12, 2013 at 2:50 PM, Danfeng Li wrote:
> When I try to run pig 0.12.0, I got the following error
>
> $ pig12 -param input="t" -param output="s" -c b224G_1.pig
> log4j:ERROR Could not find value for key lo
nity
development, we plan to contribute Parquet to the Apache Incubator when the
development is farther along.
Regards,
Nong Li, Julien Le Dem, Marcel Kornacker, Todd Lipcon, Dmitriy Ryaboy,
Jonathan Coveney, and friends.
Does the EB json loader with
elephantbird.jsonloader.nestedLoad = true
Work?
On Thu, Feb 28, 2013 at 10:44 AM, Eli Finkelshteyn
wrote:
>
> Hi Folks,
>
> I want to parse a string of complex JSON in Pig. Specifically, I want Pig
to understand my JSON array as a bag instead of as a single charar
Sounds odd. Can you send a complete script that reproduces the error
(include sample data and load statements).
On Thu, Feb 21, 2013 at 2:55 AM, Robert McCarthy <
robert.mark.mccar...@gmail.com> wrote:
> If I have some information in A, that contains dt_dt and platform, I want
> to store it in a
I don't think I've seen anyone write loaders for NetCDF, but there is no
reason one couldn't, as far as I know.
Just need to write a Hadoop InputFormat / RecordReader that implements the
format, and wrap a thing LoadFunc around it.
There is some basic documentation here :
https://pig.apache.org/do
Hi Jeff,
It does not sound like you need properties (or a configuration). It sounds
like you want to pass arguments to your LoadFunc. You can create a LoadFunc
that takes an arbitrary number of String arguments. For example, the
default loader, PigStorage, takes 2 arguments: the first is a delimite
I pulled together some of the highlights of the pig 0.11 release on the
Apache Pig blog (which now officially exists!):
https://blogs.apache.org/pig/
D
ach languages generate flatten(TOBAG(*));
* -- language_bag is a relation with three rows, ('en'), ('fr'), ('jp')
* }
*/
On Thu, Jan 24, 2013 at 1:03 PM, Dmitriy Ryaboy wrote:
>
> I have a loader that does exactly that. Let me see about dropping into
Elepha
I have a loader that does exactly that. Let me see about dropping into
Elephant-Bird.
On Thu, Jan 24, 2013 at 8:15 AM, Alan Gates wrote:
> I agree this would be useful for debugging, but I'd go about it a
> different way. Rather than add new syntax as you propose, it seems we
> could easily cr
"The udf (simple extends eval func) refers and reads a dictionary file of 6
MB for each input phrase."
Any reason to keep re-reading the dictionary instead of just reading it
once?
D
On Sun, Jan 13, 2013 at 4:47 AM, Dipesh Kumar Singh
wrote:
> The udf (simple extends eval func) refers and reads
Yang,
Try MultiStorage:
https://pig.apache.org/docs/r0.8.1/api/org/apache/pig/piggybank/storage/MultiStorage.html
On Wed, Jan 9, 2013 at 2:37 PM, Yang wrote:
> let's say I have an input dataset, each row has 2 fields, the first field
> is a value among 100 possible values. I want to just split
Please see the list of editor plugins in
https://cwiki.apache.org/confluence/display/PIG/PigTools
D
On Mon, Dec 24, 2012 at 9:42 PM, Kshiva Kps wrote:
> Hi,
>
> Is there any PIG editors and where we can write 100 to 150 pig scripts
> I'm believing is not possible to do in CLI mode .
> Like ID
Tim, can you open a github issue with EB about compiling against 0.10?
I think this is an easy fix.
On Tue, Jan 8, 2013 at 9:38 AM, Alan Gates wrote:
> I would open a new JIRA, since 1914 is focussed on building an alternative
> that discovers schema, while you are wanting to improve the existi
Two back slashes?
On Thu, Jan 10, 2013 at 6:01 PM, Eli Finkelshteyn wrote:
> This wasn't a problem in 0.9.2, but in 0.10, when I try to access a key in
> a map that has a dollar sign in it, I get hammered with errors that I
> haven't defined the variable. Specifically:
>
>blah = FOREACH meh
Details:
https://github.com/kevinweil/elephant-bird/wiki/Elephant-Bird-Lucene
On Fri, Jan 4, 2013 at 7:55 AM, Bill Graham wrote:
> ElephantBird now has pig-lucene support:
>
>
> https://github.com/kevinweil/elephant-bird/blob/master/pig-lucene/src/main/java/com/twitter/elephantbird/pig/load/Luc
Try jstacking it a few times while it's running. Is it just sitting idly in
a sleep() ?
On Mon, Jan 7, 2013 at 11:56 AM, Cheolsoo Park wrote:
> Typo: it makes much sense to run them in cluster => it doesn't make much
> sense to run them in cluster.
>
> On Mon, Jan 7, 2013 at 11:55 AM, Cheolsoo P
Are you running in local mode?
Heap error implies the *local* JVM is running into trouble (so, either
you are doing the compute locally, or something odd is going on with
processing the script or collecting the results).
What is your java Xmx set to on the local (client) machine?
On Wed, Nov 28,
This should not work in versions of hadoop that support security for
fairly obvious reasons.
On Fri, Nov 30, 2012 at 5:52 PM, Prashant Kommireddi
wrote:
> Hi Miki,
>
> What version of hadoop are you on? I can confirm this works on 0.20.2 but
> never tried this on the newer versions.
>
> Try hadoo
Mike, it's done automatically -- the operator will just stop asking
the loader for more elements.
If you observe something to the contrary, please let us know!
On Wed, Nov 28, 2012 at 7:11 PM, Mike Drob wrote:
> Hello,
>
> According to https://issues.apache.org/jira/browse/PIG-1270 the execution
That sounds reasonable, I've run into the same problem. Do you mind
submitting a patch?
On Fri, Nov 16, 2012 at 12:48 PM, pablomar
wrote:
> hi all,
>
> I'm using Pig 0.9.2 (Apache Pig version 0.9.2-cdh4.0.1, precisely)
> I got a case today on which I needed to clean up some fields before
> proces
a.opts from -Xmx200m to -Xmx1024m . It seems it doesn't
> help. And that threshold value is still the same.
> when I monitor the java process by top command, it seems the setting of
> mapred.child.java.opts have NO influence on both VIRT and RES, it seems
> mapred.child.java.opts has be
Rather than increase memory, rewrite the script so it does not need so much
ram to begin with.
You can split on $2, group and generate what you need, then join things
back.
Hard to tell what exactly you are going for without schemas and expected
inputs/outputs.
If the hadoop configs are the same,
Could you provide sample data and script that would allow us to reproduce this?
Hive is faster at some things. Pig is faster at others. Both produce
correct results.
D
On Mon, Oct 22, 2012 at 11:22 AM, yogesh dhari wrote:
>
> Hi All,
>
> Is it true that Pig's JOIN operation is not so efficient a
10-19 11:06:57,382 [main] INFO org.apache.hadoop.ipc.Client -
> Retrying connect to server: localhost/127.0.0.1:9001. Already tried 1
> time(s).
> 2012-10-19 11:06:58,383 [main] INFO org.apache.hadoop.ipc.Client -
> Retrying connect to server: localhost/127.0.0.1:9001. Already tried 2
&g
Some testing tips:
1) parametrize your load/store statements so that if you have to run
in hadoop mode, it's easy to switch to debug inputs / outputs (and
debug input/output loaders and storers). It's vastly preferable to
test in local mode when possible, since the iterations are so much
faster.
That's a Hadoop mapreduce feature, not a Pig feature, so that request
should go there.
Can't really do the _failure thing though, if you think about it --
programs can fail by crashing, in which case they might not be able to
write a file. Or maybe they are not crashing, but there is a problem
tal
again
> Martin
>
> On 18 October 2012 05:15, Dmitriy Ryaboy wrote:
>
>> Yeah that's a bug in FileLocalizer, apparently it assumes local or
>> hdfs, only. Could you file a jira?
>>
>> D
>>
>> On Sat, Oct 13, 2012 at 2:53 AM, Martin
odAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
>
>
> Thanks for taking a look. I will start looking into HCatalog too.
>
> Martin
>
>
> On 12 October 2012 18:56, Dmit
B = group A by ( name, date, url);
-- B now has 2 fields: "group" which is a tuple of (name, date, url)
and "A" which is a collection of tuples from A with the same
name-date-url
-- try "illustrate B" or "describe B" to see what that looks like
counts = foreach B generate flatten(group) as (name,
.
BinStorage is internal to pig, and you shouldn't use it unless you
really know what you are doing.
None of this is relevant to how pig optimizes queries.
D
On Sun, Oct 7, 2012 at 9:10 PM, Dmitriy Ryaboy wrote:
> Pig has multi-query execution optimization built-in. If you compute
> multiple r
I think it's trying to find the staging directory set in your
configuration, not finding it, and isn't able to create it.
depending on your configs, that could be in different places, but
usually it's looking under /tmp/mapred . Check permissions there.
D
On Mon, Oct 15, 2012 at 4:35 PM, lei tang
Sounds like however you wrote the data, it has some sort of a binary
delimiter. Figure out what that delimiter is, and tell PigStorage to
use it. For example:
my_data = load 'path/to/data' using PigStorage('\\u001');
D
On Thu, Oct 11, 2012 at 10:23 AM, yogesh dhari wrote:
>
> Hi All ,
>
> How t
The default partitioning algorithm is basically this:
reducer_id = key.hashCode() % num_reducers
If you are joining on values that all map to the same reducer_id using
this function, they will go to the same reducer. But if you have a
reasonable hash code distribution and a decent volume of uniqu
Martin,
Do you have the compete stack trace?
Generally, for Hive interop I recommend HCatalog; AllLoader is neat
but it's a 3rd party contrib and we don't really know it too well. I
can check out the error dump and see if there's anything obvious
though.
D
On Fri, Oct 12, 2012 at 8:48 AM, Martin
Yeah.. Joys of reflection.
Note that if you are writing algebraics against pig 0.11 you probably
want to extend AlgebraicEvalFunc -- that gives you the normal exec()
and the accumulative implementation for free.
D
On Wed, Oct 10, 2012 at 10:20 AM, Ugljesa Stojanovic wrote:
> Yeah i managed to fi
Pig has multi-query execution optimization built-in. If you compute
multiple relations in your script that share parent relations, those
parent relations will be computed only once. You don't have to do
anything to make that happen.
If you prefer to handle your own caching, you would have to handl
bucketing and partitioning is just setting the files up right. you can
do that explicitly.
Pig also lets you push down any filtering and projection into the
loader, as long as said loader is aware of how to deal with filters
and projections. Using any such loader will give you the benefits.
HCatLo
Hi Lei,
This is currently not supported.
However one can always create a new loadfunc and implement his own parsing
(perhaps by extending PigStorage and overriding the parsing bits).
D
On Fri, Sep 28, 2012 at 4:05 PM, lei tang wrote:
> Hi,
>
> Is it possible to use a regular expression as a del
ry I am bit hazy over here...
>
> On Fri, Sep 28, 2012 at 3:12 PM, Dmitriy Ryaboy
> wrote:
>
> > When you tried 2888, did you have pig.exec.mapPartAgg set to true,
> > and pig.exec.mapPartAgg.minReduction set to a low value (2 or 3)?
> >
> > You said you ap
atch and see if that makes any
> > difference..
> >
> > Thanks very much for responding
> >
> >
> >
> > On Tue, Aug 28, 2012 at 11:45 PM, Dmitriy Ryaboy >wrote:
> >
> >> Couple of ideas:
> >>
> >> 1) do you need exact distinct counts? The
If someone figures this out ll the way to working code, could you blog it?
:)
D
On Thu, Sep 27, 2012 at 10:54 AM, Rohini Palaniswamy <
rohini.adi...@gmail.com> wrote:
> Ray,
>In the frontend, you can do a new JobConf(HBaseConfiguration.create())
> and pass that to TableMapReduceUtil.initCred
With Pig 0.9 you can do this, though:
FOREACH html_pages GENERATE portal_id, (html matches 'some pattern' ? 1 :
0) as
wp_match:int;
On Thu, Sep 27, 2012 at 10:38 AM, Alan Gates wrote:
> In Pig 0.9 boolean was not yet a first class data type, so boolean types
> were not allowed in foreach stat
son in Pig, not that I
would recommend that).
D
On Wed, Sep 26, 2012 at 9:34 PM, Russell Jurney wrote:
> Does that work without lzo?
>
> Russell Jurney http://datasyndrome.com
>
> On Sep 26, 2012, at 9:00 PM, Dmitriy Ryaboy wrote:
>
> > Try asking Michael May on gihub? This
Try asking Michael May on gihub? This seems to be an issue with his Loader..
The JsonLoader in ElephantBird should work in this case if you turn on
nested parsing (
https://github.com/kevinweil/elephant-bird/blob/master/pig/src/main/java/com/twitter/elephantbird/pig/load/JsonLoader.java
)
D
On W
What are ".Z files"?
On Wed, Sep 19, 2012 at 12:22 AM, Srini wrote:
> Hello All,
>
> Is there any Build-in Load function for loading ".Z" files ?
>
> Regards,
> Srini
>
ndering if anyone had seen this in their
> scripts before.
>
>Brian
>
>
> On Sun, Sep 16, 2012 at 10:24 PM, Dmitriy Ryaboy
> wrote:
>
> > I just ran this very script three times using Pig 0.8 (svn revision
> > 1148107) on a set of 2.5 million
only supplies input to the mapper?
>
> When you are talking about downstream code from the loader that assumes that
> each tuple is a new Tuple, is there any code in Pig that assumes that or are
> you just talking about UDF's and other 3rd party libs that people write for
> Pi
rovide.
>
> brian
>
>
> On Sun, Sep 16, 2012 at 10:02 PM, Dmitriy Ryaboy wrote:
>
>> Brian, could you provide a complete script that reproduces the issue?
>> What version of pig are you on?
>>
>> Thanks,
>> -D
>>
>> On Sun, Sep 1
I am not sure why pushProjection doesn't solve your dilemma?
This is what we use in HBaseStorage, and ElephantBird uses in thrift
and protobuf loaders.
D
On Sun, Sep 16, 2012 at 8:11 PM, Jim Donofrio wrote:
> I guess a workaround could be to Base64 decode the pig.script property and
> look for A
Brian, could you provide a complete script that reproduces the issue?
What version of pig are you on?
Thanks,
-D
On Sun, Sep 16, 2012 at 8:15 PM, Brian Choi wrote:
> Yes - i saw this issue with SAMPLE() in multiple runs. The strangest thing
> about this is that it approaches the correct values f
>
> On 2012-9-16, at 下午5:05, Haitao Yao wrote:
>
>> here's the explain result compressed.(The apache mail server does not allow
>> big attachments.)
>>
>>
>>
>> Haitao Yao
>> yao.e...@gmail.com
>> weibo: @haitao_yao
>> Skype: haita
I looked into this a while back -- trouble comes when something
downstream from the loader tries to collect inputs into a bag, and
doesn't do its own copies. One can easily argue that if someone wants
to do such collection, it should be their responsibility to ensure
they aren't just collecting the
Still would like to see the script or the explain plan..
D
On Sat, Sep 15, 2012 at 7:50 PM, Haitao Yao wrote:
> No, I also thought it is a mapper , but It surely is a reducer. all the
> mappers succeeded and the reducer failed.
>
>
>
> Haitao Yao
> yao.e...@gmail.com
> weibo: @haitao_yao
> Skyp
We tend to write protobuf or thrift definition for complex objects,
but that introduces severe latency into the development process.
I suppose you could try something like kryo (and create a
corresponding deserializer for EB).. the core of the problem is that
you need to carry around the schema, an
Wow, that's a fantastic presentation Adam!
Nice job on all the examples and slides.
D
On Sat, Sep 15, 2012 at 3:16 AM, Adam Kawa wrote:
> Hi All,
>
> I would like to share my slides from the presentation about Apache Pig
> that I gave at the 3rd meeting of WHUG (Warsaw Hadoop User Group) a
> cou
execute goal
> com.github.igor-petruk.protobuf:protobuf-maven-pl
> ugin:0.4:run (default) on project elephant-bird-core: Unable to find
> 'protoc' ->
> [Help 1]
> [ERROR]
>
> On Tue, Sep 11, 2012 at 4:24 PM, Mohit Anchlia wrote:
>
>> Thanks! I'll try it out.
&g
Group, and pass the grouped sets to your batch-processing UDF?
so:
data:
id1 bucket1
id2 bucket2
id3 bucket2
id4 bucket1
bucketized = group data by bucket_id;
bucket1, { (id1, id4) }
bucket2, { (id2, id3) }
batch_processed = foreach bucketized generate MyUDF(data);
D
On Wed, Sep 12, 2012 at
Yup:
https://github.com/kevinweil/elephant-bird
D
On Tue, Sep 11, 2012 at 4:00 PM, Mohit Anchlia wrote:
> Is it the code that I checkout and build?
>
> On Tue, Sep 11, 2012 at 3:27 PM, Dmitriy Ryaboy wrote:
>
>> Try the one in Elephant-Bird.
>>
>> On Tue, S
Try the one in Elephant-Bird.
On Tue, Sep 11, 2012 at 11:22 AM, Mohit Anchlia wrote:
> Is there a way to read BytesWritable using sequence file loader from
> piggybank? If not then how should I go about implementing one?
Hi Thomas,
This isn't a complete answer, but take a look at mock.Storage that
Julien wrote to make testing easy:
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/mock/Storage.java
D
On Fri, Sep 7, 2012 at 6:34 AM, Thomas Schlosser
wrote:
> Hi all,
> does anybody know what th
Please take a look at Alek and Jimmy's paper on ML in Pig; there are
also a few presentations they did on this, here's one from the Hadoop
Summit:
https://speakerdeck.com/u/lintool/p/large-scale-machine-learning-at-twitter
Note also that Ted Dunning has taken some of the stuff we open-sourced
that
etting better.
> Should I continue merging?
>
>
> 2012/8/29 Dmitriy Ryaboy :
>> Can you try the same scans with a regular hbase mapreduce job? If you see
>> the same problem, it's an hbase issue. Otherwise, we need to see the script
>> and some facts about your ta
That's cause you used "group all" which groups everything into one
group, which by definition can only go to one reducer.
What if instead you group into some large-enough number of buckets?
A = LOAD 'records.txt' USING PigStorage('\t') AS (recordId:int);
A_PRIME = FOREACH A generate *, ROUND(RAN
You can also look at what Vertica did for their Pig connector:
https://github.com/vertica/Vertica-Hadoop-Connector/blob/master/pig-connector/com/vertica/pig/VerticaLoader.java
(it's apache licensed, so if you reuse any code, you have to indicate
the Vertica copyright and apache license in credits
I tried to reproduce this and haven't been able to -- all my devious
attempts to get something that is actually a string to show up as an
int in "describe" wind up in class cast exceptions and blown up jobs
(not devious enough, clearly).
Can you give put together an example that reproduces the iss
Please take a look at your job tracker page. It will have a failed job, which
will have failed tasks, which will have more detailed error logs.
On Aug 28, 2012, at 5:52 PM, Mohit Anchlia wrote:
> I have this simple pig script but when I run I get:
>
> 2012-08-28 17:50:24,924 [main] INFO
> org
Couple of ideas:
1) do you need exact distinct counts? There are approximate distinct counting
approaches that may be appropriate an much more efficient.
2) can you try with pig-2888?
On Aug 28, 2012, at 1:35 PM, Deepak Tiwari wrote:
> Hi,
>
> I am processing huge dataset and need to aggrega
Can you try the same scans with a regular hbase mapreduce job? If you see the
same problem, it's an hbase issue. Otherwise, we need to see the script and
some facts about your table (how many regions, how many rows, how big a
cluster, is the small range all on one region server, etc)
On Aug 27,
I think you just want this:
filt = filter colors_in by $color_filter;
(no quotes)
D
On Mon, Aug 27, 2012 at 1:50 PM, Duckworth, Will
wrote:
> I am trying to use a parameter as the expression in a filter.
>
> Assuming:
>
> colors_in = load ‘$in_path’ as (color:chararray);
> flt = filter colors_
It works.
Dan, pig should have printed out the name of a file it's logging
errors to. That file will have a more complete error trace. Can you
send that?
D
On Sat, Aug 25, 2012 at 5:43 PM, Subir S wrote:
> I think HBaseStorage does not work in this version of pig. There were
> few JIRAs, I cann
Yeah, these should be published to maven.
D
On Wed, Aug 15, 2012 at 3:49 AM, Віталій Тимчишин wrote:
> Hello.
>
> We are starting to use pig for our data analysis.
> To be exact, actual work will be performed by amazon elastic map reduce.
> That's why we are using 0.9.2 for now.
> Everything wor
This class is part of the Sun Java 6 JDK . What version of Java are
you running?
You should have something along the lines of
/usr/lib/jvm/java-6-openjdk/jre/lib/rhino.jar on your classpath.
Dmitriy
On Wed, Aug 15, 2012 at 9:46 AM, Russell Jurney
wrote:
> Cross posting in hopes a user has this
That would be quite handy I think.
D
On Thu, Aug 9, 2012 at 12:24 PM, Xavier Stevens wrote:
> Does anyone else think it would make sense to have all operators and
> functions listed on a single page somewhere as a reference? Right now they
> are split up over the "Pig Latin Basics" and "Built In
For CSV excel, check out
http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html
D
>> Also, is PigStorage compatible with the quoting expected by excel
>> tab-delimited files? AIUI that would require quoting the values with
>> "value\tvalue" and escaping doub
You are talking about changing the way hadoop works; something like
this would be transparent to Pig.
Note that Hadoop Distributed Cache != "distributed memory cache".
I suppose you could replace the value of fs.file.impl from
org.apache.hadoop.fs.LocalFileSystem to something else.. might be
qui
Julien removed a dozen or so loader/storer instantiations.
That can do it if you do work in constructors.
D
On Fri, Aug 10, 2012 at 1:15 PM, Prashant Kommireddi
wrote:
> Thanks Chun.
>
> Jon, any idea what on 0.11 might have fixed it?
>
> On Thu, Aug 9, 2012 at 3:32 PM, Chun Yang
> wrote:
>
>> I
I'm just curious, why do you expect pig 0.10 tests to succeed on 0.9.2?
D
On Mon, Aug 6, 2012 at 6:57 AM, lulynn_2008 wrote:
> Hi All,
> I am running pig-0.10.0 e2e test with pig-0.9.2 and hadoop-1.0.3. There are
> 12 tests failed with "Sort check failed" error.
>
> I list Order_6 as a example
Sounds like your hbase conf is not on the classpath.
D
On Mon, Aug 6, 2012 at 11:31 AM, Mohit Anchlia wrote:
> I am trying to read records from HBase using HBaseStorage. When I execute
> simple load I get this error. I think I am missing some property, but I am
> running pig on the cluster where
Won't be able to make it. Would love to see what you guys come up with
about the UDFs.
D
On Mon, Jul 30, 2012 at 9:42 AM, Alan Gates wrote:
> Hortonworks will be hosting the next Pig Hackathon on August 24th.
> http://www.meetup.com/PigUser/events/75286212/
>
> The agenda:
>
> - Help newcomers
;> - Failed to produce result in: "file:/tmp/temp61624047/tmp1087576502"
>>>> 2012-07-25 17:20:36,107 [main] INFO
>>>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>>>> - Failed!
>>>> 2012-07-25 17:20
Using the store expression you wrote should work. Dump is its own thing and
doesn't know anything about the format you store things in. To see files
created on hdfs, you can use cat.
On Jul 25, 2012, at 3:48 AM, wrote:
> Hi All,
>
> I am new to PIG, trying to stroe data in HDFS as comma sepa
1 - 100 of 732 matches
Mail list logo