RE: custom UDF generates SpillableMemoryManager and task is killed

2014-01-26 Thread Yigitbasi, Nezih
Hi Davide, Your UDF is doing a lot of intensive processing without reporting its progress. EvalFunc class has a reporter field, please use that to report progress in your UDF (use reporter.progress() method) so that Hadoop doesn't kill your task. Nezih -Original Message- From: Davide Br

RE: Issue connectiong to HBase using Pig's HBaseStorage: Unable to find region for my_table

2014-01-20 Thread Yigitbasi, Nezih
access the table/columnfamily via Python/Starbase/REST API: In [27]: table.insert('my-key-1', { 'bytes_per_hour_time_series': {'series': "test"}}) Out[27]: 200 In [29]: table.fetch('my-key-1') Out[29]: {'bytes_per_hour_time_series': {&

RE: Issue connectiong to HBase using Pig's HBaseStorage: Unable to find region for my_table

2014-01-20 Thread Yigitbasi, Nezih
The log says " NoServerForRegionException: Unable to find region for bluecoat ". Are all region servers up & running? Also can you do any "put"s to this table through the hbase shell? -Original Message- From: Russell Jurney [mailto:russell.jur...@gmail.com] Sent: Monday, January 20, 201

RE: Pig UDF: XPath

2014-01-15 Thread Yigitbasi, Nezih
Seems like you are trying to run your UDF not the piggybank's xpath udf. Can you post your pig script? -Original Message- From: Sameer Tilak [mailto:ssti...@live.com] Sent: Wednesday, January 15, 2014 11:29 AM To: user@pig.apache.org Subject: Pig UDF: XPath Hi everyone

running a task exactly once

2014-01-14 Thread Yigitbasi, Nezih
Hi everyone, I have a Pig script where at the beginning I need to run an initializing/setup logic (for initializing a storage system) that has to be done *exactly* once in my cluster. The problem is if it gets executed in multiple threads there is some risk of leaving the storage system in an in

Graph Builder 2.0 released!

2014-01-07 Thread Yigitbasi, Nezih
Hello Everyone, We are excited to announce that the version 2.0 (alpha) of Intel(r) Graph Builder library is released. With this release, Graph Builder aims to bring graph ETL support to the Pig scripting language. Please take a peek into Ted's blog on the new release and its enhancements: h

RE: listdir() python function is not wokring on hadoop

2013-12-06 Thread Yigitbasi, Nezih
re any way to read hdfs files one by one and passing to one funtion. On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih wrote: > I can call listdir to read from local filesystem in a python UDF. Did > you implement your function as a proper UDF? >

RE: listdir() python function is not wokring on hadoop

2013-12-05 Thread Yigitbasi, Nezih
I can call listdir to read from local filesystem in a python UDF. Did you implement your function as a proper UDF? From: Haider [haider.n...@gmail.com] Sent: Monday, December 02, 2013 5:22 AM To: user@pig.apache.org Subject: listdir() python function is not

weird classpath problem

2013-12-04 Thread Yigitbasi, Nezih
Hi everyone, I am having some weird classpath issues with a UDF that returns a custom tuple. My custom tuple has an arraylist of custom objects. It looks like: class MyTuple private ArrayList list; When the UDF is called, everything works fine: the tuples are created and the UDF returns succ

RE: problem with simple cpython udf

2013-11-07 Thread Yigitbasi, Nezih
Thanks to Jeremy Karn, we figured out that the problem is with the name of the python script. The script's name was 'test.py' and apparently some other test.py was picked up from python path during runtime. Changing the name fixed the problem. Nezih From: Yigitbasi, Nezih

problem with simple cpython udf

2013-11-07 Thread Yigitbasi, Nezih
Hi, I am having problems running a very simple cpython udf with Pig 0.12, Python 2.7.3, and Hadoop 1.2.1. I have the following cpython udf: from pig_util import outputSchema @outputSchema("as:int") def square(num): if num == None: return None return ((num) * (num)) And then in