Hey,
I am trying to add a native lib file from HDFS to distributed cache so it can
be loaded inside a UDF. This seems pretty standard and has been around for
years, so I’m guessing I am overlooking something. Here’s my basic approach,
please let me know if you spot the problem.
1. Put
Hey,
I am trying to add a native lib file from HDFS to distributed cache so it can
be loaded inside a UDF. This seems pretty standard and has been around for
years, so I’m guessing I am overlooking something. Here’s my basic approach,
please let me know if you spot the problem.
1. Put
Did you look at the stack trace in the Pig log file and Hadoop task log?
On Wed, Dec 11, 2013 at 11:12 AM, Sameer Tilak wrote:
> Hi All,
> I am trying to use Distributed cache in my UDF. I have the following file
> in HDFS that I want all my map functions to have available locally:
Hi All,
I am trying to use Distributed cache in my UDF. I have the following file in
HDFS that I want all my map functions to have available locally:
hadoop dfs -ls /scratch/-rw-r--r-- 1 userid supergroupsize date time
/scratch/id_lookup
In My pig script I pass it as a parameter
ov 1, 2013, at 10:31 PM, Jim Donofrio wrote:
Any thoughts on this?
On 10/22/2013 10:36 AM, Jim Donofrio wrote:
JobControlCompiler.setupDistributedCache only calls
DistributedCache.addCacheFile. Can you add support for adding archives
in the distributed cache by calling DistributedCache.addCacheArc
gt;
> > > Note that the smaller relation that will be loaded into memory needs to
> > be
> > > specified second in the JOIN statement.
> > >
> > > Also keep in mind that HDFS doesn't perform well with lots of small
> > files.
> > > If yo
> C = JOIN B BY 1, A BY 1 USING 'replicated';
> > dump C;
> >
> > Note that the smaller relation that will be loaded into memory needs to
> be
> > specified second in the JOIN statement.
> >
> > Also keep in mind that HDFS doesn't perform well
ght benefit from loading
> that data into some database (e.g. HBase).
>
>
> On Tue, Nov 5, 2013 at 7:29 AM, burakkk wrote:
>
> > Hi,
> > I'm using Pig 0.8.1-cdh3u5. Is there any method to use distributed cache
> > inside Pig?
> >
> > My problem is
ign has (lots of) small files, you might benefit from loading
that data into some database (e.g. HBase).
On Tue, Nov 5, 2013 at 7:29 AM, burakkk wrote:
> Hi,
> I'm using Pig 0.8.1-cdh3u5. Is there any method to use distributed cache
> inside Pig?
>
> My problem is that: I have l
Hi,
I'm using Pig 0.8.1-cdh3u5. Is there any method to use distributed cache
inside Pig?
My problem is that: I have lots of small files in hdfs. Let's say 10 files.
Each files contain more than one rows but I need only one row. But there
isn't any relationship between each other. S
Jim Donofrio wrote:
>> JobControlCompiler.setupDistributedCache only calls
>> DistributedCache.addCacheFile. Can you add support for adding archives
>> in the distributed cache by calling DistributedCache.addCacheArchive
>> based on a set of typical file extensions or by adding
Any thoughts on this?
On 10/22/2013 10:36 AM, Jim Donofrio wrote:
JobControlCompiler.setupDistributedCache only calls
DistributedCache.addCacheFile. Can you add support for adding archives
in the distributed cache by calling DistributedCache.addCacheArchive
based on a set of typical file
JobControlCompiler.setupDistributedCache only calls
DistributedCache.addCacheFile. Can you add support for adding archives
in the distributed cache by calling DistributedCache.addCacheArchive
based on a set of typical file extensions or by adding a
getCacheArchives() method to EvalFunc
Hi Phanish,
EvalFuncs can implement getCacheFiles to register files that should be
included in distributed cache:
http://pig.apache.org/docs/r0.11.1/api/org/apache/pig/EvalFunc.html#getCacheFiles()
-Mark
On Tue, Jun 25, 2013 at 11:37 AM, Phanish Lakkarasu <
abhishek.do...@icloud.com>
Hello,
How can I add multiple files to distributed cache in pig UDF.
Regards
Abhi
Take a look here
http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html under
"Loading the Distributed Cache".
On Tue, Jun 25, 2013 at 11:41 AM, abhishek wrote:
>
> > Hello,
>
> > How can I add multiple files to distributed cache in pig UDF.
> >
> > Regards
> > Abhi
>
> Hello,
> How can I add multiple files to distributed cache in pig UDF.
>
> Regards
> Abhi
when I use the distributed cache , I found that when the file is more than
100MB or the number of records are more than 10 million , the file can not be
cache in the memory; and I try to set the io.sort.mb is 200MB ; it still can
not work, Any suggestion would be fine! Thank you !
2012-11
Thank you so much! Both Replicated join and UDF to use
distributed cache are useful for me, I am already done it , Thank you again.
2012-11-15
yingnan.ma
发件人: Prashant Kommireddi
发送时间: 2012-11-15 03:52:09
收件人: user@pig.apache.org
抄送:
主题: Re: distributed cache
If it's for pur
If it's for purposes other than a Join, you could write a UDF to use
distributed cache. Look at the section "Loading the Distributed Cache"
http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html
On Wed, Nov 14, 2012 at 11:44 AM, Ruslan Al-Fakikh wrote:
> Maybe th
Maybe this is what you are looking for:
http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html
see "Replicated join"
On Tue, Nov 13, 2012 at 11:46 AM, yingnan.ma wrote:
> Hi ,
>
> I used the distributed cache in the hadoop though the "setup" and "
Hi ,
I used the distributed cache in the hadoop though the "setup" and "static"
store an hashset in the mem;
and I try to use the distributed cache in the Pig, and I don't know how to
store an hashset in the mem,I just can cache the file in the mem.
Any advise wo
You are talking about changing the way hadoop works; something like
this would be transparent to Pig.
Note that Hadoop Distributed Cache != "distributed memory cache".
I suppose you could replace the value of fs.file.impl from
org.apache.hadoop.fs.LocalFileSystem to something else..
Hello
Can we use Distributed Cache to store intermediate results after the Map
Phase so that these can be used in Reduce phase from cache.
So as to improve performance of Map-Reduce Job.
I found a Paper regarding usage of Cache in Map-Reduce,
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnu
elix
> >
> > On Fri, Mar 16, 2012 at 5:37 PM, Prashant Kommireddi <
> prash1...@gmail.com
> > >wrote:
> >
> > > Felix,
> > >
> > > 0.7 does not support distributed cache within Pig UDFs. Is there a
> reason
> > > you are using such a
espect to the Udfs need any migration work?
>
> Thanks,
>
> Felix
>
> On Fri, Mar 16, 2012 at 5:37 PM, Prashant Kommireddi >wrote:
>
> > Felix,
> >
> > 0.7 does not support distributed cache within Pig UDFs. Is there a reason
> > you are using such an old ve
n Fri, Mar 16, 2012 at 5:37 PM, Prashant Kommireddi wrote:
> Felix,
>
> 0.7 does not support distributed cache within Pig UDFs. Is there a reason
> you are using such an old version of Pig?
>
> 0.9 and later would support this for you. Alan's book has great info on
> doing
Felix,
0.7 does not support distributed cache within Pig UDFs. Is there a reason
you are using such an old version of Pig?
0.9 and later would support this for you. Alan's book has great info on
doing this http://ofps.oreilly.com/titles/9781449302641/writing_udfs.html
Thanks,
Prashant
O
I need to put a small shared file on distributed cache so I can load it my
udf in pig0.7. We are using Hadoop 0.20.2+228. I tried to run it using
PIG_OPTS="-Dmapred.cache.archives=hdfs://namenode.host:5001/user/gen/categories/exclude/2012-03-15/exclude-categories#excludeCat
29 matches
Mail list logo