Take a look at the JobClient API. You can use that to get the current
progress of a running job.
On Sunday, April 29, 2012, Ondřej Klimpera wrote:
Hello I'd like to ask you what is the preferred way of getting running
jobs progress from Java application, that has executed them.
Im using
+1 on Edward's comment.
The MapR comment was relevant and informative and the original poster never
said he was only interested in open source options.
On Sunday, April 22, 2012, Michael Segel wrote:
Gee Edward, what about putting a link to a company website or your blog in
your signature...
- bcc: u...@nutch.apache.org common-user@hadoop.apache.org
This is great Jason. One thing to add though is this line in your Pig
script:
SET mapred.map.tasks.speculative.execution false
Otherwise you'll likely going to get duplicate writes into accumulo.
On Fri, Mar 2, 2012 at 5:48 AM, Jason
You might want to check out File Crusher:
http://www.jointhegrid.com/hadoop_filecrush/index.jsp
I've never used it, but it sounds like it could be helpful.
On Tue, Feb 21, 2012 at 10:25 AM, Bejoy Ks bejoy.had...@gmail.com wrote:
Hi Mohit
AFAIK XMLLoader in pig won't be suited for
If you're able to put your data in directories named by date (i.e.
MMdd), you can take advantage of the fact that the HDFS client will
return directories in sort order of the name, which returns the most recent
dirs last. You can then cron a bash script that deletes all the but last N
There was a fairly long discussion on this topic at the beginning of the
year FYI:
http://search-hadoop.com/m/JvSQe2wNlY11
On Mon, Aug 15, 2011 at 9:00 PM, Chris Song sjh...@gmail.com wrote:
Why hadoop should be built in JAVA?
For integrity and stability, it is good for hadoop to be
Are you able to distcp folders that don't have special characters?
What are the versions of the two clusters and are you running on the
destination cluster if there's a mis-match? If there is you'll need to use
hftp:
http://hadoop.apache.org/common/docs/current/distcp.html#cpver
On Wed, May 18,
If you could share more specifics regarding just how it's not working
(i.e., job specifics, stack traces, how you're invoking it, etc), you
might get more assistance in troubleshooting.
On Wed, Apr 6, 2011 at 1:44 AM, Shuja Rehman shujamug...@gmail.com wrote:
-libjars is not working nor
param2
param3
but the program still giving the error and does not find the mylib.jar. can
u confirm the syntax of command?
thnx
On Wed, Apr 6, 2011 at 8:29 PM, Bill Graham billgra...@gmail.com wrote:
If you could share more specifics regarding just how it's not working
(i.e., job specifics
Shuja, I haven't tried this, but from what I've read it seems you
could just add all your jars required by the Mapper and Reducer to
HDFS and then add them to the classpath in your run() method like
this:
DistributedCache.addFileToClassPath(new Path(/myapp/mylib.jar), job);
I think that's all
Unfortunately conf/collectors is used in two different ways in Chukwa,
each with a different syntax. This should really be fixed.
1. The script that starts the collectors looks at it for a list of
hostnames (no ports) to start collectors on. To start it just on one
host, set it to localhost.
2.
Yes, we run light weight Chukwa agents and collectors only, using
Chukwa just as you describe. We've been doing so for over a year or so
without many issues. The code is fairly easy to extend when needed. We
rolled our own collector, agent and demux RPMs.
The monitoring peice of chukwa is
Chukwa hasn't had a release since moving from Hadoop to incubator so
there are no releases in the /incubator repos. Follow the link on the
Chukwa homepage to the downloads repos:
http://incubator.apache.org/chukwa/
http://www.apache.org/dyn/closer.cgi/hadoop/chukwa/chukwa-0.4.0
On Sun, Mar 20,
For the even lazier, you could give both the test data and the
expected output data. That way they know for sure if they got it
right. This also promotes a good testing best practice, which is to
assert against and expected set of results.
On Wed, Jan 5, 2011 at 9:19 AM, Mark Kerzner
We're using Chukwa to do steps a-d before writing summary data into MySQL.
Data is written into new directories every 5 minutes. Our MR jobs and data
load into MySQL takes 5 minutes, so after a 5 minute window closes, we
typically have summary data from that interval in MySQL about a few minutes
JIRA seems to be down FYI. Database errors are being returned:
*Cause: *
org.apache.commons.lang.exception.NestableRuntimeException:
com.atlassian.jira.exception.DataAccessException:
org.ofbiz.core.entity.GenericDataSourceException: Unable to esablish a
connection with the database. (FATAL:
Chukwa also has a JMSAdaptor that can listen to an AMQ queue and stream the
messages to a collector(s) to then be persisted as sequence files.
On Fri, Aug 20, 2010 at 3:29 AM, cliff palmer palmercl...@gmail.com wrote:
You may want to consider using something like the *nix tee command to save
Sorry to hijack the thread but I have a similar use case.
In a few months we're going to be moving colos. The new cluster will be the
same size as the current cluster and some downtime is acceptable. The
hostnames will be different. From what I've read in this thread it seems
like it would be
Ahh yes of course, distcp. Thanks!
On Tue, Aug 10, 2010 at 11:01 AM, Allen Wittenauer awittena...@linkedin.com
wrote:
On Aug 10, 2010, at 10:54 AM, Bill Graham wrote:
Is is correct to say that that would work fine? We have a replication
factor
of 2, so we'd be copying twice as much data
Your understanding of how Chukwa works is correct.
Hadoop by itself is a system that contains both the HDFS and the MapReduce
systems. The other projects you lists are all projects built upon Hadoop,
but you don't need them to run or to get value out of Hadoop by itself.
To run the Chukwa agent
Hi,
Check out Chukwa:
http://hadoop.apache.org/chukwa/docs/r0.3.0/design.html#Introduction
It allows you to run agents which tail log4j output and send the data to
collectors, which write the data to HDFS.
thanks,
Bill
On Wed, Apr 21, 2010 at 3:43 AM, Dhanya Aishwarya Palanisamy
Hi Utku,
We're using Chukwa to collect and aggregate data as you describe and so far
it's working well. Typically chukwa collectors are deployed to all data
nodes, so there is no master write-bottleneck with this approach actually.
There have been discussions lately on the Chukwa list regarding
somehow need to connect to the namenode from the collectors.
Isn't it the case while trying to reach the HDFS, or the Chukwa collectors
are writing on the local drives instead of HDFS?
Best,
Utku
On Mon, Mar 22, 2010 at 6:34 PM, Bill Graham billgra...@gmail.com wrote:
Hi Utku,
We're
Not sure if what you're asking is possible or not, but you could experiment
with these params to see if you could achieve a similar effect.
property
namemapred.userlog.limit.kb/name
value0/value
descriptionThe maximum size of user-logs of each task in KB. 0 disables
the cap.
/description
Hi,
This morning the namenode of my hadoop cluster shut itself down after the
logs/ directory had filled itself with job configs, log files and all the
other fun things hadoop leaves there. It had been running for a few months.
I deleted all off the job configs and attempt log directories and
Typically companies will patent their IP as a defensive measure to protect
themselves from being sued, as has been pointed out already. Another
typical reason is to exercise the patent against companies that present a
challenge to their core business.
I would bet that unless you're making a
I'm don't know about the auditing tools, but I have seen files get randomly
deleted in dev setups when using hadoop with the default hadoop.tmp.dir
setting, which is /tmp/hadoop-${user.name}.
On Thu, Nov 19, 2009 at 9:03 AM, Stas Oskin stas.os...@gmail.com wrote:
Hi.
I have a strange case
http://wiki.apache.org/hadoop/HowToContribute
Search for Applying a patch and you'll find this:
patch -p0 cool_patch.patch
On Mon, Jul 27, 2009 at 2:33 PM, Gopal Gandhi gopal.gandhi2...@yahoo.comwrote:
I am going to apply a patch to hadoop (version 18.3). I searched on line
but could not
I ran into the same issue when using the default settings for dfs.data.dir,
which is under the /tmp directory. Files in this directory will be cleaned
our periodically as needed by the OS, which will break HDFS.
On Thu, Jul 2, 2009 at 7:01 AM, Gross, Danny danny.gr...@spansion.comwrote:
Hello
29 matches
Mail list logo