Re: How to store data into hbase by using Pig

Dmitriy Ryaboy Wed, 27 Apr 2011 08:34:12 -0700

1)
2011-04-27 10:29:32,953 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_201104251150_0071
2011-04-27 10:29:32,954 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- More information at:
http://haisen11:50030/jobdetails.jsp?jobid=job_201104251150_0071
2011-04-27 10:29:52,654 [main] INFO
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_201104251150_0071 has failed! Stop running all dependent jobs


^^^ Look at the job error logs.

2)

generate $0, $2 -- there is no $2, you only loaded two columns ($0 and $1).
Those are the ones you're going to be wanting.

3) loadKey, as the name implies, only applies to loading data, not to
storing it. It doesn't hurt anything to have it there, but it's not actually
doing anything.


D

On Wed, Apr 27, 2011 at 1:38 AM, byambajargal <byambaa.0...@gmail.com>wrote:

> Hello
>
> I am using pig version pig 0.8.0
>
>
> A = load '/passwd' using PigStorage(':');B = foreach A generate $0 as id,
> $2 as value;dump B;
>
> the result of first part is here:
>
> (twilli,6259)
> (saamodt,6260)
> (hailu268,6261)
> (oddsen,6262)
> (neuhaus,6263)
> (zoila,6264)
> (elinmn,6265)
> (diego,6266)
> (fsudmann,6267)
> (yanliang,6268)
> (nestor,6269)
>
> As i understood the problem is at the second part
>
>
> store B into 'table2' using
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a','-loadKey');
>
> I am suspecting that problem is for row key i am not sure how it can manage
> the row key .
> what i want is first item should be the row key and second item should be
> the column of hbase table.
>
> when i run the query i have got the following result on my task tracker:
>
> grunt> A = load '/passwd' using PigStorage(':');B = foreach A generate $0
> as id, $2 as value;store B into 'table2' using
>  org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a','-loadKey');
> 2011-04-27 10:29:29,785 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
> script: UNKNOWN
> 2011-04-27 10:29:29,785 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
> pig.usenewlogicalplan is set to true. New logical plan will be used.
> 2011-04-27 10:29:29,913 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:zookeeper.version=3.3.3-1073969, built on 02/23/2011
> 22:27 GMT
> 2011-04-27 10:29:29,913 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:host.name=haisen10.ux.uis.no
> 2011-04-27 10:29:29,913 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.version=1.6.0_23
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.vendor=Sun Microsystems Inc.
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.home=/opt/jdk1.6.0_23/jre
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client
> environment:java.class.path=/etc/hbase/conf:/usr/lib/pig/bin/../conf:/opt/jdk/lib/tools.jar:/usr/lib/pig/bin/../pig-0.8.0-cdh3u0-core.jar:/usr/lib/pig/bin/../build/pig-*-SNAPSHOT.jar:/usr/lib/pig/bin/../lib/ant-contrib-1.0b3.jar:/usr/lib/pig/bin/../lib/automaton.jar:/usr/lib/pig/bin/../build/ivy/lib/Pig/*.jar:/usr/lib/hadoop/hadoop-core-0.20.2-cdh3u0.jar:/usr/lib/hadoop/lib/ant-contrib-1.0b3.jar:/usr/lib/hadoop/lib/aspectjrt-1.6.5.jar:/usr/lib/hadoop/lib/aspectjtools-1.6.5.jar:/usr/lib/hadoop/lib/commons-cli-1.2.jar:/usr/lib/hadoop/lib/commons-codec-1.4.jar:/usr/lib/hadoop/lib/commons-daemon-1.0.1.jar:/usr/lib/hadoop/lib/commons-el-1.0.jar:/usr/lib/hadoop/lib/commons-httpclient-3.0.1.jar:/usr/lib/hadoop/lib/commons-logging-1.0.4.jar:/usr/lib/hadoop/lib/commons-logging-api-1.0.4.jar:/usr/lib/hadoop/lib/commons-net-1.4.1.jar:/usr/lib/hadoop/lib/core-3.1.1.jar:/usr/lib/hadoop/lib/hadoop-fairscheduler-0.20.2-cdh3u0.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.jar:/usr/lib/hadoop/lib/hsqldb-1.8.0.10.LICENSE.txt:/usr/lib/hadoop/lib/jackson-core-asl-1.5.2.jar:/usr/lib/hadoop/lib/jackson-mapper-asl-1.5.2.jar:/usr/lib/hadoop/lib/jasper-compiler-5.5.12.jar:/usr/lib/hadoop/lib/jasper-runtime-5.5.12.jar:/usr/lib/hadoop/lib/jdiff:/usr/lib/hadoop/lib/jets3t-0.6.1.jar:/usr/lib/hadoop/lib/jetty-6.1.26.jar:/usr/lib/hadoop/lib/jetty-servlet-tester-6.1.26.jar:/usr/lib/hadoop/lib/jetty-util-6.1.26.jar:/usr/lib/hadoop/lib/jsch-0.1.42.jar:/usr/lib/hadoop/lib/jsp-2.1:/usr/lib/hadoop/lib/junit-4.5.jar:/usr/lib/hadoop/lib/kfs-0.2.2.jar:/usr/lib/hadoop/lib/kfs-0.2.LICENSE.txt:/usr/lib/hadoop/lib/log4j-1.2.15.jar:/usr/lib/hadoop/lib/mockito-all-1.8.2.jar:/usr/lib/hadoop/lib/oro-2.0.8.jar:/usr/lib/hadoop/lib/servlet-api-2.5-20081211.jar:/usr/lib/hadoop/lib/servlet-api-2.5-6.1.14.jar:/usr/lib/hadoop/lib/slf4j-api-1.4.3.jar:/usr/lib/hadoop/lib/slf4j-log4j12-1.4.3.jar:/usr/lib/hadoop/lib/xmlenc-0.52.jar:/etc/hbase/conf::/usr/lib/hadoop/conf
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client
> environment:java.library.path=/opt/jdk1.6.0_23/jre/lib/amd64/server:/opt/jdk1.6.0_23/jre/lib/amd64:/opt/jdk1.6.0_23/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.io.tmpdir=/tmp
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:java.compiler=<NA>
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:os.name=Linux
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:os.arch=amd64
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:os.version=2.6.18-194.32.1.el5.centos.plus
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:user.name=haisen
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:user.home=/home/ekstern/haisen
> 2011-04-27 10:29:29,914 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Client environment:user.dir=/import/br1raid6a1c1/haisen
> 2011-04-27 10:29:29,915 [main] INFO  org.apache.zookeeper.ZooKeeper -
> Initiating client connection, connectString=haisen11:2181
> sessionTimeout=180000 watcher=hconnection
> 2011-04-27 10:29:29,923 [main-SendThread()] INFO
>  org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> haisen11/152.94.1.130:2181
> 2011-04-27 10:29:29,926 [main-SendThread(haisen11:2181)] INFO
>  org.apache.zookeeper.ClientCnxn - Socket connection established to
> haisen11/152.94.1.130:2181, initiating session
> 2011-04-27 10:29:29,936 [main-SendThread(haisen11:2181)] INFO
>  org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> haisen11/152.94.1.130:2181, sessionid = 0x12f8c18a1340177, negotiated
> timeout = 40000
> 2011-04-27 10:29:29,972 [main] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Lookedup root region location,
> connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@67f31652;
> hsa=haisen10.ux.uis.no:60020
> 2011-04-27 10:29:30,018 [main] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cached location for .META.,,1.1028785192 is haisen10.ux.uis.no:60020
> 2011-04-27 10:29:30,020 [main] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cache hit for row <> in tableName .META.: location server
> haisen10.ux.uis.no:60020, location region name .META.,,1.1028785192
> 2011-04-27 10:29:30,024 [main] DEBUG
> org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at
> row=table2,,00000000000000 for max=10 rows
> 2011-04-27 10:29:30,028 [main] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cached location for
> table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e. is
> haisen6.ux.uis.no:60020
> 2011-04-27 10:29:30,030 [main] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cache hit for row <> in tableName table2: location server
> haisen6.ux.uis.no:60020, location region name
> table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e.
> 2011-04-27 10:29:30,031 [main] INFO
>  org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table
> instance for table2
> 2011-04-27 10:29:30,068 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: B:
> Store(table2:org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:a','-loadKey'))
> - scope-6 Operator Key: scope-6)
> 2011-04-27 10:29:30,085 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler -
> File concatenation threshold: 100 optimistic? false
> 2011-04-27 10:29:30,122 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2011-04-27 10:29:30,122 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2011-04-27 10:29:30,187 [main] INFO
>  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added
> to the job
> 2011-04-27 10:29:30,204 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3
> 2011-04-27 10:29:31,684 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2011-04-27 10:29:31,709 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map-reduce job(s) waiting for submission.
> 2011-04-27 10:29:32,059 [Thread-7] INFO  org.apache.zookeeper.ZooKeeper -
> Initiating client connection, connectString=haisen11:2181
> sessionTimeout=180000 watcher=hconnection
> 2011-04-27 10:29:32,060 [Thread-7-SendThread()] INFO
>  org.apache.zookeeper.ClientCnxn - Opening socket connection to server
> haisen11/152.94.1.130:2181
> 2011-04-27 10:29:32,061 [Thread-7-SendThread(haisen11:2181)] INFO
>  org.apache.zookeeper.ClientCnxn - Socket connection established to
> haisen11/152.94.1.130:2181, initiating session
> 2011-04-27 10:29:32,063 [Thread-7-SendThread(haisen11:2181)] INFO
>  org.apache.zookeeper.ClientCnxn - Session establishment complete on server
> haisen11/152.94.1.130:2181, sessionid = 0x12f8c18a1340178, negotiated
> timeout = 40000
> 2011-04-27 10:29:32,070 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Lookedup root region location,
> connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@1f248f2b;
> hsa=haisen10.ux.uis.no:60020
> 2011-04-27 10:29:32,074 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cached location for .META.,,1.1028785192 is haisen10.ux.uis.no:60020
> 2011-04-27 10:29:32,074 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cache hit for row <> in tableName .META.: location server
> haisen10.ux.uis.no:60020, location region name .META.,,1.1028785192
> 2011-04-27 10:29:32,076 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.MetaScanner - Scanning .META. starting at
> row=table2,,00000000000000 for max=10 rows
> 2011-04-27 10:29:32,080 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cached location for
> table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e. is
> haisen6.ux.uis.no:60020
> 2011-04-27 10:29:32,081 [Thread-7] DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation
> - Cache hit for row <> in tableName table2: location server
> haisen6.ux.uis.no:60020, location region name
> table2,,1303809998908.0a8a5a1a398c449de8f29a2cf082f30e.
> 2011-04-27 10:29:32,082 [Thread-7] INFO
>  org.apache.hadoop.hbase.mapreduce.TableOutputFormat - Created table
> instance for table2
> 2011-04-27 10:29:32,102 [Thread-7] INFO
>  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths
> to process : 1
> 2011-04-27 10:29:32,102 [Thread-7] INFO
>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths to process : 1
> 2011-04-27 10:29:32,110 [Thread-7] INFO
>  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input
> paths (combined) to process : 1
> 2011-04-27 10:29:32,211 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 0% complete
> 2011-04-27 10:29:32,953 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - HadoopJobId: job_201104251150_0071
> 2011-04-27 10:29:32,954 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - More information at:
> http://haisen11:50030/jobdetails.jsp?jobid=job_201104251150_0071
> 2011-04-27 10:29:52,654 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - job job_201104251150_0071 has failed! Stop running all dependent jobs
> 2011-04-27 10:29:52,666 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2011-04-27 10:29:52,674 [main] ERROR
> org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
> 2011-04-27 10:29:52,677 [main] INFO  org.apache.pig.tools.pigstats.PigStats
> - Script Statistics:
>
> HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt
>  Features
> 0.20.2-cdh3u0   0.8.0-cdh3u0    haisen  2011-04-27 10:29:30     2011-04-27
> 10:29:52     UNKNOWN
>
> Failed!
>
> Failed Jobs:
> JobId   Alias   Feature Message Outputs
> job_201104251150_0071   A,B     MAP_ONLY        Message: Job failed! Error
> - NA table2,
>
> Input(s):
> Failed to read data from "/passwd"
>
> Output(s):
> Failed to produce result in "table2"
>
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
>
> Job DAG:
> job_201104251150_0071
>
>
> 2011-04-27 10:29:52,677 [main] INFO
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
>
>
> thank you
>
> Byambajargal
>
>
> On 4/27/11 06:07, Bill Graham wrote:
>
>> What version of Pig are you running and what errors are you seeing on
>> the task trackers?
>>
>> On Tue, Apr 26, 2011 at 4:46 AM, byambajargal<byambaa.0...@gmail.com>
>>  wrote:
>>
>>> Hello ...
>>> I have a question for you
>>>
>>> I am doing a pig job as following that read from hdfs simply to store
>>> hbase
>>> when i start the job first part works fine and second part was failure.
>>> Could you give me a direction how to move data from hdfs to Hbase
>>>
>>>
>>>  A = load '/passwd' using PigStorage(':');B = foreach A generate $0 as
>>> id,
>>> $2 as value;dump B;
>>>  store B into 'table2' using
>>>  org.apache.pig.backend.hadoop.hbase.HBaseStorage( 'cf:a' , '-loadKey');
>>>
>>> thank you for your help
>>>
>>> Byambajargal
>>>
>>>
>>>
>>> On 4/25/11 18:26, Dmitriy Ryaboy wrote:
>>>
>>>> The first element of the relation you store must be the row key. You
>>>> aren't
>>>> loading the row key, so load>    store isn't working.
>>>> Try
>>>> my_data = LOAD 'hbase://table1' using
>>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1', '-loadKey') ;
>>>>
>>>> On Mon, Apr 25, 2011 at 5:32 AM,
>>>> byambajargal<byambaa.0...@gmail.com>wrote:
>>>>
>>>>  Hello guys
>>>>>
>>>>> I am running cloudere distribution cdh3u0 on my cluster with Pig and
>>>>> Hbase.
>>>>> i can read data from hbase using the following pig query:
>>>>>
>>>>> my_data = LOAD 'hbase://table1' using
>>>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1') ;dump my_data
>>>>>
>>>>> but when i try to store data into hbase as same way the job was
>>>>> failure.
>>>>>
>>>>> store my_data into 'hbase://table2' using
>>>>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('cf:1');
>>>>>
>>>>> the table1 and the table2 has same structure and same column.
>>>>>
>>>>>
>>>>> the table i have:
>>>>>
>>>>> hbase(main):029:0* scan 'table1'
>>>>> ROW                 COLUMN+CELL
>>>>>  row1               column=cf:1, timestamp=1303731834050, value=value1
>>>>>  row2               column=cf:1, timestamp=1303731849901, value=value2
>>>>>  row3               column=cf:1, timestamp=1303731858637, value=value3
>>>>> 3 row(s) in 0.0470 seconds
>>>>>
>>>>>
>>>>> thanks
>>>>>
>>>>> Byambajargal
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>

Re: How to store data into hbase by using Pig

Reply via email to