I hithttps://issues.apache.org/jira/browse/PIG-3512
Le 24/03/2014 14:40, Vincent Barat a écrit :
Hi,
Since I moved from Pig 0.10.0 to 0.11.0 or 0.12.0, the estimation
of the number of reducers no longer work.
My script:
A = load 'data';
B = group A by $0;
store B into 'out';
My data
Hi,
Since I moved from Pig 0.10.0 to 0.11.0 or 0.12.0, the estimation
of the number of reducers no longer work.
My script:
A = load 'data';
B = group A by $0;
store B into 'out';
My data:
grunt ls
hdfs://computation-master.dev.ubithere.com:9000/user/root/.staging dir
On Thu, Dec 26, 2013 at 3:35 AM, Vincent Barat vincent.ba...@gmail.comwrote:
Hi all and merry Christmas !
I generate a file using a Pig script embedded in a Java process and store
it using a BinStorage.
Then, I would like to read this file directly from another Java client,
but without starting
storage is associated with Input/outputFormat as well as
RecordReader/Writer.
As for BinStorage, you can take a look at BinStorageRecordReader-
https://github.com/apache/pig/blob/trunk/src/org/apache/pig/impl/io/BinStorageRecordReader.java#L40
On Thu, Dec 26, 2013 at 3:35 AM, Vincent Barat vincent.ba
Hi all and merry Christmas !
I generate a file using a Pig script embedded in a Java process and
store it using a BinStorage.
Then, I would like to read this file directly from another Java
client, but without starting a Pig script (i.e only by using Hadoop
API and Pig's BinStorage class).
and pig.exec.reducers.max
might be useful:
https://issues.apache.org/jira/browse/PIG-1249
Norbert
On Tue, May 21, 2013 at 9:23 AM, Vincent Barat vincent.ba...@gmail.comwrote:
Thanks for your reply.
My goal is actually to AVOID using PARALLEL toi let PIG guess a good
number of reducer by itself
, Vincent Barat vincent.ba...@gmail.com
wrote:
Hi,
I use this request to remove duplicated entries from a set of input files
(I cannot use DISTINCT since some fields can be different)
grp = GROUP alias BY key;
alias = FOREACH grp {
record = LIMIT alias 1;
GENERATE FLATTEN(record
Hi,
We are using HBaseStorage intensively to load data from tables
having more than 100 regions.
HBaseStorage generates 1 map par region, and our cluster having 50
map slots, it happens that our PIG scripts start 50 maps reading
concurrently data from HBase.
The problem is that our HBase
--
Vincent Barat
CTO
Contact info
vba...@capptain.com
www.capptain.com
Cell: +33 6 15
Hi,
I've a very simple script that try to import a PIG file:
set pig.import.search.path '/tmp'
import 'event.pig';
Even if the file /tmp/event.pig exists, it cannot be found.
It seems that the function getImportScriptAsReader that deals with
the pig.import.search.path property is not even
Really no advise on that from anybody ?
Le 08/08/12 18:33, Vincent Barat a écrit :
Hi,
I'm seeking for a way to load data from a SQL database.
It seems that the SQLLoader (yet mentioned in the Wiki) no longer
exists.
Sqoop seems to be a good solution, but it is not so convenient in
my us
to my customers a easy to use PIG
environment allowing to perform computation on data comming from
HBase, SQL and Hadoop files, if possible without having to deal with
workflow tools like Oozie).
What is your recommendations about that ?
Cheers
--
Vincent
Hi,
I'm seeking for a way to load data from a SQL database.
Something like:
It seems that the SQLLoader (yet mentioned in the Wiki) no longer
exists.
Sqoop seems to be a good solution, but it is not so convenient in my
us case (I need to deliver to my customers a easy to use PIG
Thank you Bill, that should do the work :-)
Le 29/06/12 18:16, Bill Graham a
crit:
There's support for this in the trunk FYI:
https://issues.apache.org/jira/browse/PIG-2600
On Fri, Jun 29, 2012 at 5:05 AM, Vincent Barat vincent.ba...@gmail.comwrote
Hi folks,
I use replicated joins, and recently I encountered an issue : my
rightmost relation seems to become too big and, even if I don't get
any Java heap space the time it take to finish the maps become
exponentially long (I cannot figure why exactly).
Removing replicated fix the issue,
Hi,
I try to figure out why PIG is using so many zookeeper connections
(from the frontend machine) when using HBaseStorage().
I added a trace in the constructor of HBaseStorage()
I wrote a simple script loading an HBase table:
sessions = LOAD 'hbase://mytable' USING
Hi,
I made more investigation and I updated the issue to provide a very
easy way to reproduce it.
This seems to be an important regression in BinStorage()
https://issues.apache.org/jira/browse/PIG-2271
Le 09/09/11 11:36, Vincent Barat a écrit :
Issue reported:
https://issues.apache.org
:**chararray), but I
cannot find an explanation to that. I verified such conversion should
be valid on 0.9. Can you show me the script?
Daniel
On Tue, Aug 30, 2011 at 5:14 AM, Vincent Barat
vincent.ba...@gmail.com
wrote:
Hi,
I have experienced the same issue by loading the data from raw text
Hi,
I have experienced the same issue by loading the data from raw text
files (using PIG server in local mode and the regular PIG loader)
and from HBaseStorage.
The issue is exactly the same in both cases: each time a NULL string
is encountered, the cast to a data bag cannot be done.
Le
Le 23/08/11 20:28, Dmitriy Ryaboy a écrit :
We should add merge join support to HBaseStorage, it should be able to do
that for joins on the table key.
It would be great !
Are your locids skewed? Have you tried using 'skewed' join for the last job?
Actually, if locations are small, you can
FYI, this was fixed by PIG-2193.
Le 26/07/11 19:40, Vincent Barat a écrit :
Hi,
I'm using PIG 0.8.1 with HBase 0.90 and the following script
sometime returns an empty set, and sometimes work !
start_sessions = LOAD 'startSession' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage
(
log4j.logger.org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher,
INFO);
props.setProperty(log4j.logger.org.apache.pig.tools.pigstats.PigStats,
INFO);
Le 27/06/11 13:21, Vincent Barat a écrit :
Thanks, but it is actually not what I'm looking for.
First, I use PIG from Java, not from the pig shell, so the -d
option cannot
Hi,
Over the bunch of request I run using PIG 0.8.1, the most heavy one
is the following:
/* load session data from HBase */
start_sessions = load ... (start of sessions)
end_sessions = load ... (end of sessions)
location = load ... (session location)
info = load ... (session
So, I've tried the exact same request but loading the data from HDFS
files (using the regular Pig loader) : it works !
Here is the request loading from HDFS:
start_sessions = LOAD 'start_sessions' AS (sid:chararray,
infoid:chararray, imei:chararray, start:long);
end_sessions = LOAD
I've reported the issue here:
https://issues.apache.org/jira/browse/PIG-2193
Still investigating, but seems so far that the FILTER clause makes
the HBase loader loose all fields that are not explicitly used in
the script
I striped down the request to:
start_sessions = LOAD
Hi,
I'd like to make PIG load only a subset of an HBase table, based on
the timestamp of the records, or on the key of the rows.
As an example, I'd like to load only records that have a timestamp
N, or a key something.
I know that HBase can handle scanners that are highly optimized to
Thanks for the input, [3] is more related to timestamp storage,
anyway I added my 2 cents to the issue concerning loading by timestamp.
Le 28/07/11 13:19, Norbert Burger a écrit :
You can instruct HBaseStorage to load a subset of the rows using the -gt
and -lt options to HBaseStorage,
More info on this issue:
1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append
2- The issue can be reproduced with PIG trunk too
The script:
start_sessions = LOAD 'startSession.mde253811.preprod.ubithere.com'
USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:infoid
A precision: HBase classes of the PIG trunk cannot be compiled
inside PIG 0.8.1, so I was enable to test if a fix was introduced in
the last version of these classes.
So 2- must not be taken into account
Le 27/07/11 14:38, Vincent Barat a écrit :
More info on this issue:
1- I use PIG 0.8.1
27/07/11 14:38, Vincent Barat a écrit :
More info on this issue:
1- I use PIG 0.8.1 and HBase 0.90.3 and Hadoop 0.20-append
2- The issue can be reproduced with PIG trunk too
The script:
start_sessions = LOAD
'startSession.mde253811.preprod.ubithere.com' USING
working consistently ?
Thanks,
Thejas
On 7/27/11 7:31 AM, Vincent Barat wrote:
I built the pig trunk with hbase 0.90.3 client lib (ant
-Dhbase.version=0.90.3) and the issue is still here.
It makes me thing about an issue in the optimizer... Anyway the
fact is
that my request is not complex, so I
The behavior is not random.
The first dump is always empty, and the second always works.
I will try what you ask, and if I have more details, I will create a
JIRA issue.
Thanks.
Le 27/07/11 16:59, Raghu Angadi a écrit :
Vincent,
is the behavior random or the same each time?
Couple of
Hi,
I'm using PIG 0.8.1 with HBase 0.90 and the following script
somethime returns an empty set, and sometimes work !
start_sessions = LOAD 'startSession' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
Hi,
I'm using PIG 0.8.1 with HBase 0.90 and the following script
sometime returns an empty set, and sometimes work !
start_sessions = LOAD 'startSession' USING
org.apache.pig.backend.hadoop.hbase.HBaseStorage('meta:sid
meta:infoid meta:imei meta:timestamp') AS (sid:chararray,
case.
As a work arround, I created a symbolic link:
lrwxr-xr-x 1 vbaratwheel13 27 jui 12:20
hadoop-${user.name}@ - hadoop-vbarat
drwxr-xr-x 4 vbaratwheel 136 27 jui 12:20 hadoop-vbarat/
and it works.
Le 17/05/11 17:02, Vincent Barat a écrit :
Hi,
I'm trying to run PIG 0.8.1
Thanks, but it is actually not what I'm looking for.
First, I use PIG from Java, not from the pig shell, so the -d
option cannot be used.
Second, I want to be able to choose between which trace I want, and
which I don't, depending on the origin of the trace.
I guess log4j can be configured for
You're right, I tested pig 0.8.0 with hbase 0.20.6, and not 0.8.1
(very sorry).
Le 24/05/11 07:44, Dmitriy Ryaboy a écrit :
You couldn't have possibly tested with Pig 0.8.1 successfully, as it
does not work with HBase 0.20.6 at all. This issue should not show up
if you use Pig 0.8.1 and HBase
);
DUMP nbRows;
--
*Vincent BARAT, UBIKOD, CTO*
vba...@ubikod.com mailto:vba...@ubikod.com Mob +33 (0)6 15 41 15 18
UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89
UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel
combination is on.
--
*Vincent BARAT, UBIKOD, CTO*
vba...@ubikod.com mailto:vba...@ubikod.com Mob +33 (0)6 15 41 15 18
UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
Cergy-Pontoise cedex, FRANCE, Tel +33 (0)1 34 43 28 89
UBIKOD Rennes, 10 rue Duhamel, 35000 Rennes, FRANCE, Tel
Hi,
I'm trying to run PIG 0.8.1 in loval mode from Java.
My code used to work with PIG 0.6.1.
Now, I got the following exception:
java.io.FileNotFoundException: File
file:/tmp/hadoop-vbarat/mapred/system/job_local_0001/job.xml does
not exist.
at
space before storing this rich_sessions in
a file.
Is there any way to do this ?
Thank for your help,
--
*Vincent BARAT, UBIKOD, CTO*
vba...@ubikod.com mailto:vba...@ubikod.com Mob +33 (0)6 15 41 15 18
UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard Hirsch, 95021
Cergy-Pontoise cedex
tons of CFCs to the stratosphere, and increase their population by
263,000
Bowers, C.A. The Culture of Denial: Why the Environmental Movement Needs
a
Strategy for Reforming Universities and Public Schools. New York: State
University of New York Press, 1997: (4) (5) (p.206)
--
*Vincent BARAT
all
activity tuples in order to save space before storing this rich_sessions in
a file.
Is there any way to do this ?
Thank for your help,
--
*Vincent BARAT, UBIKOD, CTO*
vba...@ubikod.com mailto:vba...@ubikod.com Mob +33 (0)6 15 41 15 18
UBIKOD Paris, c/o ESSEC VENTURES, Avenue Bernard
Hi,
I use PIG 0.6.0 through its Java API. I'm trying to jump from 0.6.0
to 0.8.0, but I got the following error when I run my PIG jobs from
Java:
2011-02-08 16:59:38,244 | INFO | main | MapReduceLauncher | 0% complete
2011-02-08 16:59:39,170 | INFO | main | MapReduceLauncher | job
null
44 matches
Mail list logo