distributed cache + native lib

2014-09-24 Thread Adam Silberstein
Hey, I am trying to add a native lib file from HDFS to distributed cache so it can be loaded inside a UDF. This seems pretty standard and has been around for years, so I’m guessing I am overlooking something. Here’s my basic approach, please let me know if you spot the problem. 1. Put

distributed cache + native lib

2014-09-24 Thread Adam Silberstein
Hey, I am trying to add a native lib file from HDFS to distributed cache so it can be loaded inside a UDF. This seems pretty standard and has been around for years, so I’m guessing I am overlooking something. Here’s my basic approach, please let me know if you spot the problem. 1. Put

Re: Apache Pig UDF and Distributed cache

2013-12-14 Thread Cheolsoo Park
Did you look at the stack trace in the Pig log file and Hadoop task log? On Wed, Dec 11, 2013 at 11:12 AM, Sameer Tilak wrote: > Hi All, > I am trying to use Distributed cache in my UDF. I have the following file > in HDFS that I want all my map functions to have available locally:

Apache Pig UDF and Distributed cache

2013-12-11 Thread Sameer Tilak
Hi All, I am trying to use Distributed cache in my UDF. I have the following file in HDFS that I want all my map functions to have available locally: hadoop dfs -ls /scratch/-rw-r--r-- 1 userid supergroupsize date time /scratch/id_lookup In My pig script I pass it as a parameter

Re: Re: support for distributed cache archives

2013-11-07 Thread Jim Donofrio
ov 1, 2013, at 10:31 PM, Jim Donofrio wrote: Any thoughts on this? On 10/22/2013 10:36 AM, Jim Donofrio wrote: JobControlCompiler.setupDistributedCache only calls DistributedCache.addCacheFile. Can you add support for adding archives in the distributed cache by calling DistributedCache.addCacheArc

Re: Pig Distributed Cache

2013-11-06 Thread burakkk
gt; > > > Note that the smaller relation that will be loaded into memory needs to > > be > > > specified second in the JOIN statement. > > > > > > Also keep in mind that HDFS doesn't perform well with lots of small > > files. > > > If yo

Re: Pig Distributed Cache

2013-11-05 Thread Pradeep Gollakota
> C = JOIN B BY 1, A BY 1 USING 'replicated'; > > dump C; > > > > Note that the smaller relation that will be loaded into memory needs to > be > > specified second in the JOIN statement. > > > > Also keep in mind that HDFS doesn't perform well

Re: Pig Distributed Cache

2013-11-05 Thread burakkk
ght benefit from loading > that data into some database (e.g. HBase). > > > On Tue, Nov 5, 2013 at 7:29 AM, burakkk wrote: > > > Hi, > > I'm using Pig 0.8.1-cdh3u5. Is there any method to use distributed cache > > inside Pig? > > > > My problem is

Re: Pig Distributed Cache

2013-11-05 Thread Pradeep Gollakota
ign has (lots of) small files, you might benefit from loading that data into some database (e.g. HBase). On Tue, Nov 5, 2013 at 7:29 AM, burakkk wrote: > Hi, > I'm using Pig 0.8.1-cdh3u5. Is there any method to use distributed cache > inside Pig? > > My problem is that: I have l

Pig Distributed Cache

2013-11-05 Thread burakkk
Hi, I'm using Pig 0.8.1-cdh3u5. Is there any method to use distributed cache inside Pig? My problem is that: I have lots of small files in hdfs. Let's say 10 files. Each files contain more than one rows but I need only one row. But there isn't any relationship between each other. S

Re: support for distributed cache archives

2013-11-04 Thread Alan Gates
Jim Donofrio wrote: >> JobControlCompiler.setupDistributedCache only calls >> DistributedCache.addCacheFile. Can you add support for adding archives >> in the distributed cache by calling DistributedCache.addCacheArchive >> based on a set of typical file extensions or by adding

Re: support for distributed cache archives

2013-11-02 Thread Jim Donofrio
Any thoughts on this? On 10/22/2013 10:36 AM, Jim Donofrio wrote: JobControlCompiler.setupDistributedCache only calls DistributedCache.addCacheFile. Can you add support for adding archives in the distributed cache by calling DistributedCache.addCacheArchive based on a set of typical file

support for distributed cache archives

2013-10-22 Thread Jim Donofrio
JobControlCompiler.setupDistributedCache only calls DistributedCache.addCacheFile. Can you add support for adding archives in the distributed cache by calling DistributedCache.addCacheArchive based on a set of typical file extensions or by adding a getCacheArchives() method to EvalFunc

Re: Adding files to distributed cache

2013-06-25 Thread Mark Wagner
Hi Phanish, EvalFuncs can implement getCacheFiles to register files that should be included in distributed cache: http://pig.apache.org/docs/r0.11.1/api/org/apache/pig/EvalFunc.html#getCacheFiles() -Mark On Tue, Jun 25, 2013 at 11:37 AM, Phanish Lakkarasu < abhishek.do...@icloud.com>

Adding files to distributed cache

2013-06-25 Thread Phanish Lakkarasu
Hello, How can I add multiple files to distributed cache in pig UDF. Regards Abhi

Re: Adding files to distributed cache

2013-06-25 Thread Prashant Kommireddi
Take a look here http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html under "Loading the Distributed Cache". On Tue, Jun 25, 2013 at 11:41 AM, abhishek wrote: > > > Hello, > > > How can I add multiple files to distributed cache in pig UDF. > > > > Regards > > Abhi >

Adding files to distributed cache

2013-06-25 Thread abhishek
> Hello, > How can I add multiple files to distributed cache in pig UDF. > > Regards > Abhi

Re: Re: Re: distributed cache

2012-11-16 Thread yingnan.ma
when I use the distributed cache , I found that when the file is more than 100MB or the number of records are more than 10 million , the file can not be cache in the memory; and I try to set the io.sort.mb is 200MB ; it still can not work, Any suggestion would be fine! Thank you ! 2012-11

Re: Re: distributed cache

2012-11-14 Thread yingnan.ma
Thank you so much! Both Replicated join and UDF to use distributed cache are useful for me, I am already done it , Thank you again. 2012-11-15 yingnan.ma 发件人: Prashant Kommireddi 发送时间: 2012-11-15 03:52:09 收件人: user@pig.apache.org 抄送: 主题: Re: distributed cache If it's for pur

Re: distributed cache

2012-11-14 Thread Prashant Kommireddi
If it's for purposes other than a Join, you could write a UDF to use distributed cache. Look at the section "Loading the Distributed Cache" http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html On Wed, Nov 14, 2012 at 11:44 AM, Ruslan Al-Fakikh wrote: > Maybe th

Re: distributed cache

2012-11-14 Thread Ruslan Al-Fakikh
Maybe this is what you are looking for: http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html see "Replicated join" On Tue, Nov 13, 2012 at 11:46 AM, yingnan.ma wrote: > Hi , > > I used the distributed cache in the hadoop though the "setup" and "

distributed cache

2012-11-12 Thread yingnan.ma
Hi , I used the distributed cache in the hadoop though the "setup" and "static" store an hashset in the mem; and I try to use the distributed cache in the Pig, and I don't know how to store an hashset in the mem,I just can cache the file in the mem. Any advise wo

Re: Using Distributed Cache in PIG

2012-08-13 Thread Dmitriy Ryaboy
You are talking about changing the way hadoop works; something like this would be transparent to Pig. Note that Hadoop Distributed Cache != "distributed memory cache". I suppose you could replace the value of fs.file.impl from org.apache.hadoop.fs.LocalFileSystem to something else..

Using Distributed Cache in PIG

2012-08-13 Thread kapil bhosale
Hello Can we use Distributed Cache to store intermediate results after the Map Phase so that these can be used in Reduce phase from cache. So as to improve performance of Map-Reduce Job. I found a Paper regarding usage of Cache in Map-Reduce, http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnu

Re: Distributed Cache in Pig0.7

2012-03-19 Thread felix gao
elix > > > > On Fri, Mar 16, 2012 at 5:37 PM, Prashant Kommireddi < > prash1...@gmail.com > > >wrote: > > > > > Felix, > > > > > > 0.7 does not support distributed cache within Pig UDFs. Is there a > reason > > > you are using such a

Re: Distributed Cache in Pig0.7

2012-03-17 Thread Jonathan Coveney
espect to the Udfs need any migration work? > > Thanks, > > Felix > > On Fri, Mar 16, 2012 at 5:37 PM, Prashant Kommireddi >wrote: > > > Felix, > > > > 0.7 does not support distributed cache within Pig UDFs. Is there a reason > > you are using such an old ve

Re: Distributed Cache in Pig0.7

2012-03-16 Thread felix gao
n Fri, Mar 16, 2012 at 5:37 PM, Prashant Kommireddi wrote: > Felix, > > 0.7 does not support distributed cache within Pig UDFs. Is there a reason > you are using such an old version of Pig? > > 0.9 and later would support this for you. Alan's book has great info on > doing

Re: Distributed Cache in Pig0.7

2012-03-16 Thread Prashant Kommireddi
Felix, 0.7 does not support distributed cache within Pig UDFs. Is there a reason you are using such an old version of Pig? 0.9 and later would support this for you. Alan's book has great info on doing this http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html Thanks, Prashant O

Distributed Cache in Pig0.7

2012-03-16 Thread felix gao
I need to put a small shared file on distributed cache so I can load it my udf in pig0.7. We are using Hadoop 0.20.2+228. I tried to run it using PIG_OPTS="-Dmapred.cache.archives=hdfs://namenode.host:5001/user/gen/categories/exclude/2012-03-15/exclude-categories#excludeCat