; > installed on RHEL. I am planning to load quite a few Petabytes of Data
>> onto
>> > HDFS.
>> >
>> > Which will be the fastest method to use and are there any projects around
>> > Hadoop which can be used as well?
>> >
>> >
>
> Am I missing something?
>
> Thanks!
>
> -Tony
>
> -Original Message-
> From: Alejandro Abdelnur [mailto:t...@cloudera.com]
> Sent: Monday, July 02, 2012 11:40 AM
> To: common-user@hadoop.apache.org
> Subject: Re: hadoop security API (repost)
>
> Ton
r support in hbase if it is not there
yet
> Thanks.
>
> -Tony
>
> -Original Message-
> From: Alejandro Abdelnur [mailto:t...@cloudera.com]
> Sent: Monday, July 02, 2012 11:40 AM
> To: common-user@hadoop.apache.org
> Subject: Re: hadoop security API (repost)
>
> Tony,
&
Tony,
If you are doing a server app that interacts with the cluster on
behalf of different users (like Ooize, as you mentioned in your
email), then you should use the proxyuser capabilities of Hadoop.
* Configure user MYSERVERUSER as proxyuser in Hadoop core-site.xml
(this requires 2 properties s
If you provision your user/group information via LDAP to all your nodes it
is not a nightmare.
On Thu, Jun 7, 2012 at 7:49 AM, Koert Kuipers wrote:
> thanks for your answer.
>
> so at a large place like say yahoo, or facebook, assuming they use
> kerberos, every analyst that uses hive has an acc
I give comma separated values in these settings ?
>
> Thanks,
> Praveenesh
>
> On Mon, Apr 2, 2012 at 5:52 PM, Alejandro Abdelnur >wrote:
>
> > Praveenesh,
> >
> > If I'm not mistaken 0.20.205 does not support wildcards for the proxyuser
> > (hosts
Praveenesh,
If I'm not mistaken 0.20.205 does not support wildcards for the proxyuser
(hosts/groups) settings. You have to use explicit hosts/groups.
Thxs.
Alejandro
PS: please follow up this thread in the oozie-us...@incubator.apache.org
On Mon, Apr 2, 2012 at 2:15 PM, praveenesh kumar wrote:
You can use Oozie for that, you can write a workflow job that forks A
& B and then joins before C.
Thanks.
Alejandro
On Wed, Feb 15, 2012 at 11:23 AM, W.P. McNeill wrote:
> Say I have two Hadoop jobs, A and B, that can be run in parallel. I have
> another job, C, that takes the output of both A
Steven,
You could also look at HttpFSFilesystem in the hadoop-httpfs module, it is
quite simple and selfcontained.
Cheers.
Alejandro
On Tue, Jan 31, 2012 at 8:37 PM, Harsh J wrote:
> To write a custom filesystem, extend on the FileSystem class.
>
> Depending on the scheme it is supposed to se
Rob,
Hadoop has as a way to run Map tasks in multithreading mode, look for the
MultithreadedMapRunner & MultithreadedMapper.
Thanks.
Alejandro.
On Tue, Jan 31, 2012 at 7:51 AM, Rob Stewart wrote:
> Hi,
>
> I'm investigating the feasibility of a hybrid approach to parallel
> programming, by fu
Bill,
In addition you must call DistributedCached.createSymlink(configuration),
that should do.
Thxs.
Alejandro
On Mon, Jan 9, 2012 at 10:30 AM, W.P. McNeill wrote:
> I am trying to add a zip file to the distributed cache and have it unzipped
> on the task nodes with a softlink to the unzippe
[moving common-user@ to BCC]
Oozie is not HA yet. But it would be relatively easy to make it. It was
designed with that in mind, we even did a prototype.
Oozie consists of 2 services, a SQL database to store the Oozie jobs state
and a servlet container where Oozie app proper runs.
The solution f
Avi,
For Oozie related questions, please subscribe and use the
oozie-...@incubator.apache.org alias.
Thanks.
Alejandro
On Tue, Aug 30, 2011 at 2:28 AM, Avi Vaknin wrote:
> Hi All,
>
> First, I really enjoy writing you and I'm thankful for your help.
>
> I have Oozie installed on dedicated ser
[Moving thread to Oozie aliases and hadoop's alias to BCC]
Avi,
Currently you can have a cold standby solution.
An Oozie setup consists of 2 systems, a SQL DB (storing all Oozie jobs
state) and a servlet container (running Oozie proper). You need you DB to be
high available. You need to have a s
owse/HADOOP-7560
Thanks.
Alejandro
On Mon, Aug 22, 2011 at 3:42 PM, Tsz Wo Sze wrote:
> +1
> I believe HDFS-2178 is very close to being committed. Great work
> Alejandro!
>
> Nicholas
>
>
>
> ____
> From: Alejandro Abdelnur
> To
Hadoop developers,
Arun will be cutting a branch for Hadoop 0.23 as soon the trunk has a
successful build.
I'd like Hoop (https://issues.apache.org/jira/browse/HDFS-2178) to be part
of 0.23 (Nicholas already looked at the code).
In addition, the Jersey utils in Hoop will be handy for
https://iss
Roger,
Or you can take a look at Hadoop's MultipleOutputs class.
Thanks.
Alejandro
On Tue, Jul 26, 2011 at 11:30 PM, Luca Pireddu wrote:
> On July 26, 2011 06:11:33 PM Roger Chen wrote:
> > Hi all,
> >
> > I am attempting to implement MultipleOutputFormat to write data to
> multiple
> > files
Why don't you put your native library in HDFS and use the DistributedCache
to make them avail to the tasks. For example:
Copy 'foo.so' to 'hdfs://localhost:8020/tmp/foo.so', then added to the job
distributed cache:
DistributedCache.addCacheFile("hdfs://localhost:8020/tmp/foo.so#foo.so",
jobConf
ve acquired
> the TicketGrantingTicket from the Authentication Server and Service Ticket
> from the Ticket Granting Server. Now how to authenticate myself with hadoop
> by sending the service ticket received from the Ticket Granting Server.
>
> Regards,
> Pikini
>
> On Wed, Jan 12, 201
If you kinit-ed successfully you are done.
The hadoop libraries will do the trick of authenticating the user against
Hadoop.
Alejandro
On Thu, Jan 13, 2011 at 12:46 PM, Muruga Prabu M wrote:
> Hi,
>
> I have a Java program to upload and download files from the HDFS. I am
> using
> Hadoop with
oordinator jobs).
Regards.
Alejandro
On Tue, Dec 14, 2010 at 1:26 PM, edward choi wrote:
> Thanks for the tip. I took a look at it.
> Looks similar to Cascading I guess...?
> Anyway thanks for the info!!
>
> Ed
>
> 2010/12/8 Alejandro Abdelnur
>
> > Or, if you want
Or, if you want to do it in a reliable way you could use an Oozie
coordinator job.
On Wed, Dec 8, 2010 at 1:53 PM, edward choi wrote:
> My mistake. Come to think about it, you are right, I can just make an
> infinite loop inside the Hadoop application.
> Thanks for the reply.
>
> 2010/12/7 Harsh
The other approach, if the DR cluster is idle or has enough excess capacity,
would be running all the jobs on the input data in both clusters and perform
checksums on the outputs to ensure everything is consistent. And you could
take advantage and distribute ad hoc queries between the 2 clusters.
java.opts
> but looks like hadoop-0.20.2 ingnores it.
>
> On which version have you seen it working?
>
> Regards,
> Vitaliy S
>
> On Tue, Oct 5, 2010 at 5:14 PM, Alejandro Abdelnur
> wrote:
> > The following 2 properties should work:
> >
> > mapred
The following 2 properties should work:
mapred.map.child.java.opts
mapred.reduce.child.java.opts
Alejandro
On Tue, Oct 5, 2010 at 9:02 PM, Michael Segel wrote:
>
> Hi,
>
> You don't say which version of Hadoop you are using.
> Going from memory, I believe in the CDH3 release from Cloudera, the
Or you could try using MultiFileInputFormat for your MR job.
http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapred/MultiFileInputFormat.html
Alejandro
On Tue, Oct 5, 2010 at 4:55 PM, Harsh J wrote:
> 500 small files comprising one gigabyte? Perhaps you should try
> concat
Edward,
Yep, you should use the one from contrib/
Alejandro
On Tue, Oct 5, 2010 at 1:55 PM, edward choi wrote:
> Thanks, Tom.
> Didn't expect the author of THE BOOK would answer my question. Very
> surprised and honored :-)
> Just one more question if you don't mind.
> I read it on the Internet
And keep in mind that one split is not necessary 1 file. That depends on the
InputFormat. For example, the MultipleInputFormat, clubs together multiple
files in 1 split.
On Thu, Sep 23, 2010 at 3:16 PM, Greg Roelofs wrote:
> > Can a map task work on more than one input split?
>
> As far as I ca
Yes, you can do #1, but I wouldn't say it is practical. You can do #2
as well, as you suggest.
But, IMO, the best way is copying the JARs in HDFS and using DistributedCache.
A
On Sun, Aug 29, 2010 at 1:29 PM, Mark wrote:
> How can I add jars to Hadoops classpath when running MapReduce jobs for
In Oozie are working on MR/Pig jobs submission over HTTP.
On Thu, Jul 29, 2010 at 5:09 PM, Steve Loughran wrote:
> S. Venkatesh wrote:
>>
>> HDFS Proxy in contrib provides HTTP interface over HDFS. Its not very
>> RESTful but we are working on a new version which will have a REST
>> API.
>>
>> AF
with the name,
> but that didn't do anything.
>
> Thanks,
> Adam
>
>
> On 6/28/10 6:17 PM, "Alejandro Abdelnur" wrote:
>
>> Check the MultipleOutputs class
>>
>> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/
>>
Check the MultipleOutputs class
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
On Mon, Jun 28, 2010 at 5:31 PM, Adam Silberstein
wrote:
>
> Hi,
> I would like to run a hadoop job that write to multiple output files. I see
> a class called Mult
Also you can configure the job tracker to keep the RunningJob
information for completed jobs (avail via the Hadoop Java API). There
is a config property that enables this, another that specifies the
location (it can be HDFS or local), and another that specifies for how
many hours you want to keep t
Using the MultipleOutputs (
http://hadoop.apache.org/common/docs/r0.19.0/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
) you can split data in different files in the outputdir.
After your job finishes you can move the files to different directories.
The benefit of this doing this is that
34 matches
Mail list logo