How to contribute to Pig Wiki ?

2010-12-21 Thread Charles Gonçalves
Guys, I'm starting to use pig (0.8) now and I went to Pig Wiki for some directives and tutorials. I already found some errors and have some suggestions to contribute. How ( or to whom ) I could send those comments ? -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charles/ UFMG

Re: How to contribute to Pig Wiki ?

2010-12-21 Thread Charles Gonçalves
Ok, I will do that ... On Tue, Dec 21, 2010 at 4:24 PM, Daniel Dai wrote: > Thanks Charles! Everyone can edit Pig wiki. Just go to the wiki page, > register an account, and you can edit. We are looking forward to your > contribution! > > Daniel > > > Charles Go

Re: How to contribute to Pig Wiki ?

2010-12-22 Thread Charles Gonçalves
ou can edit. We are looking forward to your > contribution! > > Daniel > > > Charles Gonçalves wrote: > >> Guys, >> >> I'm starting to use pig (0.8) now and I went to Pig Wiki for some >> directives and tutorials. >> I already found some error

Re: How to contribute to Pig Wiki ?

2010-12-22 Thread Charles Gonçalves
> Thanks, > Thejas > > > > > On 12/22/10 4:43 AM, "Charles Gonçalves" wrote: > > Daniel, and when I found some issues in the http://pig.apache.org > documentation, > there is a way to report some issues in documentation!? > > On Tue, Dec 21, 2010 at 4:

Re: calling pig from a web app

2011-01-10 Thread Charles Gonçalves
I reinforce the interest in this topic. I'll soon need to create a web interface for my marketers colleagues ... On Mon, Jan 10, 2011 at 10:06 PM, so...@dopeness.org wrote: > I'd be interested to hear people's experience / best practices for running > pig scripts on demand from a web app. What do

Re: Controlling the Pig/Hadoop Logging Level

2011-01-17 Thread Charles Gonçalves
Hi ... Another doubt related to the logging. I can't see any of my UDFs logging outputs when running pig. I need to provide an specific log4j.xml using -log4jconf ? Can someone help me? Thanks! On Tue, Dec 28, 2010 at 8:31 PM, Dmitriy Ryaboy wrote: > Andreas, from the command line you can us

Re: Pig 0.8.0 in Maven

2011-01-22 Thread Charles Gonçalves
Great news . On Fri, Jan 21, 2011 at 10:00 PM, Richard Ding wrote: > Good news. Pig 0.8.0 now is available through maven repository: > > http://repo1.maven.org/maven2/org/apache/pig/pig/0.8.0/ > > Thanks > -- Richard > > -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charl

UDF with parameterized constructor in DEFINE statement

2011-02-01 Thread Charles Gonçalves
Hi Guys, I'm Have an UDF in which I want to pass a long in a timestamp representation and get an Date formated with the SimpleDateFormat Class. I will pass to the UDF constructor the string format to the sdf object, and eventualy the timezone if needed. So I made a class to do that but when I us

Re: UDF with parameterized constructor in DEFINE statement

2011-02-02 Thread Charles Gonçalves
> can just group by extime. In generral if you get a parsing error that comes > before errors with the udf itself, as it will try and parse the whole thing > THEN make the job > > Sent via BlackBerry > > -Original Message- > From: Charles Gonçalves > Date: Tue, 1 Feb 2011

read concatened gzip files in Pig 0.8.0

2011-02-02 Thread Charles Gonçalves
Hi Guys, I noted that concatenated gziped files not work on Hadoop https://issues.apache.org/jira/browse/HADOOP-6335 So, have anyone passed by this problem ? There is a workaround that I could do in my Load Function? I will appreciate any help!

Re: read concatened gzip files in Pig 0.8.0

2011-02-03 Thread Charles Gonçalves
> Hey Charles, I think it's supposed to be working. Have you looked at > https://issues.apache.org/jira/browse/HADOOP-6835 ?? > > Renato M. > > 2011/2/2 Charles Gonçalves > > > Hi Guys, > > > > I noted that concatenated gziped files not work on

Re: Tuple Question

2011-02-03 Thread Charles Gonçalves
Jira Issue : https://issues.apache.org/jira/browse/PIG-1841 On Tue, Feb 1, 2011 at 8:59 PM, Daniel Dai wrote: > Oh, I am wrong. SIZE is the right UDF to use. The issue is caused by > TupleSize, as Eric points out a moment ago. > > Daniel > > > Dmitr

Re: Using a file packaged into a UDF jar?

2011-02-09 Thread Charles Gonçalves
The problem isn't with pig. Is with the maxmind lib, it requires a filename ( i'm not so shure right now), but the problem is within the maxmind constructor that you hás to pass the reference for de dat file. I passed by this problem and won't be able to solve it in a better way than copying the f

Re: Unexpected data type -1 found in stream.

2011-02-10 Thread Charles Gonçalves
Jonathan, did you opened the issue? I'm suffering by the same problem ... Any follow up? On Wed, Jan 26, 2011 at 4:17 PM, Jonathan Coveney wrote: > I have never really had to raise a bug before, what should I do? Open a > ticket, attach the code and the description, and posit that it may be a

Using custom log4j (issue?)

2011-02-10 Thread Charles Gonçalves
Does anyone got problems using a custom log4j in pig using the -4 option? I'm getting the 1 Pig Stack Trace 2 --- 3 ERROR 2244: Job failed, hadoop does not return any error message 4 5 org.apache.pig.backend.executionengine.ExecException: ERROR 2244: Job failed, hadoop does

Error when using an udf with filter statment

2011-02-10 Thread Charles Gonçalves
I'm trying just to do a breakdown for all my logs but every time I use a operation like : FILTER alias BY some_udf(alias); I got a problem. First I got : ERROR 0: Scalar has more than one row in the output. : cfgmc@phoebe:~/workspace-java/MscPigScripts/scripts (121) 23:11:16 scripts:> pig -x lo

Re: Error when using an udf with filter statment

2011-02-11 Thread Charles Gonçalves
le value. > In your case you are doing that to a multi-row relation, and things blow > up. > > D > > On Thu, Feb 10, 2011 at 5:42 PM, Charles Gonçalves > wrote: > > I'm trying just to do a breakdown for all my logs but every time I use a > > operation like : >

PARALLEL INSIDE a nested foreach block / DEFAULT_PARALLEL not workin?!

2011-02-11 Thread Charles Gonçalves
Is possible to use a parallel statment inside a nested foreach block like in : 28 E = GROUP B ALL PARALLEL 100; 29 30 edge_breakdown = FOREACH E { 31 dist_cIps = DISTINCT B.cIp *PARALLEL X * ; 32 dist_sIps = DISTINCT B.sIp ; 33 urls_ok = FILTER B BY valid(url); 34

Re: PARALLEL INSIDE a nested foreach block / DEFAULT_PARALLEL not workin?!

2011-02-11 Thread Charles Gonçalves
wrote: > Possible, but it will be ignored. Anything done inside a nested foreach > block will be executed at the parallel level of the preceding group by. > > Alan. > > > On Feb 11, 2011, at 8:57 AM, Charles Gonçalves wrote: > > Is possible to use a parallel statment i

Re: Unexpected data type -1 found in stream.

2011-02-14 Thread Charles Gonçalves
upload your script and UDF! Hopefully having more vectors will help make it > easier to figure out what is going on. > > 2011/2/10 Charles Gonçalves > > > Jonathan, > > > > did you opened the issue? > > I'm suffering by the same problem ... > > > &

Re: How to find input file associated with failed map task?

2011-02-17 Thread Charles Gonçalves
Hi Scott, I work with lots of gzipped files also and sometimes I used to get the same error. I started checking the gzip files before processing them. In fact I check immediately after I put them on hdfs. What I do is a cat of the gzip file and check it with the gzip -t. For example, all the file

Quick question about Reading dirs

2011-02-17 Thread Charles Gonçalves
Guys, Does Pig read the _log directories from an output script ? What I want is to read an pig output dir (or multiples) from pig scripts. But I just want the part- files not the .part-crc or _logs files. Thanks -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charles/ UFMG

Re: Quick question about Reading dirs

2011-02-17 Thread Charles Gonçalves
t; > Directory names starting with underscores are ignored, but I am not certain > about .* files/directories. > > Amit > > > On 2/17/11 3:12 PM, "Charles Gonçalves" wrote: > > > Guys, > > > > Does Pig read the _log directories from an output scrip

Cases of Work using pig on Industry to cite on my MSc

2011-02-19 Thread Charles Gonçalves
Guys, I'm working on my MSc now using pig/hadoop to process logs. I'm basically using it to do some characterizations on a traffic analysis from some of the greatest Media groups from Brazil. One of my dissertation chapters will be from case studies where that environment (pig/hadoop) is needed du

Re: Cases of Work using pig on Industry to cite on my MSc

2011-02-19 Thread Charles Gonçalves
w.slideshare.net/rjurney/azkaban-pig-5057793 Chris Riccomini > presented Pig at LinkedIn here: > http://www.slideshare.net/hadoopusergroup/pig-at-linkedin > > On Sat, Feb 19, 2011 at 1:12 PM, Charles Gonçalves >wrote: > > > Guys, > > > > I'm working on my MS

Re: Cases of Work using pig on Industry to cite on my MSc

2011-02-20 Thread Charles Gonçalves
Presentations for those talks are posted to Yahoo's Hadoop > blog: > > http://developer.yahoo.com/blogs/hadoop/ > > > > Alan. > > > > > > On Feb 19, 2011, at 1:12 PM, Charles Gonçalves wrote: > > > > Guys, > >> > >> I'

Re: Reading Gzip Files

2011-02-21 Thread Charles Gonçalves
I'm not sure if is the same problem. I did a custom loader and I got a problem reading compressed files too. So I noticed that in the PigStorage the function getInputFormat was: public InputFormat getInputFormat() throws IOException { if(loadLocation.endsWith(".bz2") || loadLocation.end

Problem when executionengine.util.MapRedUtil combine input paths

2011-02-26 Thread Charles Gonçalves
I tried to process a big number of small files on pig and I got a strange problem. 2011-02-27 00:00:58,746 [Thread-15] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : *43458* 2011-02-27 00:00:58,755 [Thread-15] INFO org.apache.pig.backend.hadoop.execut

Re: Problem when executionengine.util.MapRedUtil combine input paths

2011-02-28 Thread Charles Gonçalves
Pig combine small files into one > map, so it is possible you get less output files. This is not the problem. But thanks anyway! If that is your concern, you can try to disable split combine using > "-Dpig.splitCombination=false" > > Daniel > > > Charles Gonçalves wro

Re: Problem when executionengine.util.MapRedUtil combine input paths

2011-02-28 Thread Charles Gonçalves
number of input records being read? > Is possible to see the counter in the history interface on JobTracker? I will run the jobs again to compare the counter, but my guess is probably not! -Thejas > > > > > On 2/28/11 10:57 AM, "Charles Gonçalves" wrote: > > I&#x

Re: Problem when executionengine.util.MapRedUtil combine input paths

2011-02-28 Thread Charles Gonçalves
gzip concatenated files. On Mon, Feb 28, 2011 at 8:47 PM, Charles Gonçalves wrote: > > > On Mon, Feb 28, 2011 at 7:39 PM, Thejas M Nair wrote: > >> Hi Charles, >> Which load function are you using ? >> > I'm using a UD load function .. > > Is the default (

Re: Problem when executionengine.util.MapRedUtil combine input paths

2011-03-01 Thread Charles Gonçalves
r each file within one input split. So gzip > concatenation should not be the case. I am not sure what happen to your > script. If possible, give us more information (script, UDF, data, version). > > Daniel > > > > On 02/28/2011 05:40 PM, Charles Gonçalves wrote: > > Guys,

STDEV

2011-03-13 Thread Charles Gonçalves
Does anyone has an UDF for calculating standard deviation ? There is none in the piggybank. Its no problem if it pass 2 through data Thanks -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charles/ UFMG - ICEx - Dcc Cel.: 55 31 87741485 Tel.: 55 31 34741485 Lab.: 55 31 34095840

pig's alike projects

2011-03-17 Thread Charles Gonçalves
Hi Guys, I read the sawzall paper today and wonder if there are any others systems like pig and sawzall? Did anyone know others projects ? Thanks -- *Charles Ferreira Gonçalves * http://homepages.dcc.ufmg.br/~charles/ UFMG - ICEx - Dcc Cel.: 55 31 87

Re: pig's alike projects

2011-03-17 Thread Charles Gonçalves
gt; Hive developed by Facebook , > SCOPE and DryadLINQ by Microsoft , > Jaql by IBM > and I heard about something called ASTERIX/AQL but I dont know anything > about it > > Regards > > Baraa > > > On Fri, Mar 18, 2011 at 2:19 AM, Charles Gonçalves > wrote: >

Re: pig's alike projects

2011-03-17 Thread Charles Gonçalves
; hehehe you are right Dmitriy Ryaboy this is an essential part of my PhD > thesis and I'm still searching :) > Good Luck > > Baraa > > > On Fri, Mar 18, 2011 at 3:11 AM, Dmitriy Ryaboy wrote: > >> That sounds like a Master's Thesis in itself :) >> >>