Hi everyone,
Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is supposed
to put each file from the input directory in a SEPARATE split. So the number of
Maps is equal to the number of input files. Yet, what I get is that each split
contains multiple paths of input files,
Exactly what I was looking for. Thanks
On 12/14/10 8:53 PM, 김영우 wrote:
Hi Mark,
You can use 'External table' in Hive.
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDLHive external table
does not move or delete files.
- Youngwoo
Can someone explain what partitioning is and why it would be used..
example? Thanks
Hi Mark,
I think you will get more and better responses for this question in the
hive mailing lists. (http://hive.apache.org/mailing_lists.html)
Regards,
Hari
On Wed, Dec 15, 2010 at 8:52 PM, Mark static.void@gmail.com wrote:
Can someone explain what partitioning is and why it would
On 09/12/10 03:40, Matthew John wrote:
Hi all,.
Is there any valid Hadoop Certification available ? Something which adds
credibility to your Hadoop expertise.
Well, there's always providing enough patches to the code to get commit
rights :)
On 10/12/10 06:14, Amandeep Khurana wrote:
Mark,
Using EMR makes it very easy to start a cluster and add/reduce capacity as
and when required. There are certain optimizations that make EMR an
attractive choice as compared to building your own cluster out. Using EMR
also ensures you are using a
On 10/12/10 09:08, Edward Choi wrote:
I was wrong. It wasn't because of the read once free policy. I tried again
with Java first again and this time it didn't work.
I looked up google and found the Http Client you mentioned. It is the one
provided by apache, right? I guess I will have to try
Hey, commit rights won't give you a nice looking certificate, would it? ;)
On Wed, Dec 15, 2010 at 09:12, Steve Loughran ste...@apache.org wrote:
On 09/12/10 03:40, Matthew John wrote:
Hi all,.
Is there any valid Hadoop Certification available ? Something which adds
credibility to your
But it would give you the right creds for people that you’d want to work for :)
James
On 2010-12-15, at 10:26 AM, Konstantin Boudnik wrote:
Hey, commit rights won't give you a nice looking certificate, would it? ;)
On Wed, Dec 15, 2010 at 09:12, Steve Loughran ste...@apache.org wrote:
On
On 09/12/10 18:57, Aaron Eng wrote:
Pros:
- Easier to build out and tear down clusters vs. using physical machines in
a lab
- Easier to scale up and scale down a cluster as needed
Cons:
- Reliability. In my experience I've had machines die, had machines fail to
start up, had network outages
On 15/12/10 17:26, Konstantin Boudnik wrote:
Hey, commit rights won't give you a nice looking certificate, would it? ;)
Depends on what hudson says about the quality of your patches. I mean,
if every commit breaks the build, it soon becomes public
Hi,
What do the following two File Sytem counters associated with a job
(and printed at the end of a job's execution) represent?
FILE_BYTES_READ and FILE_BYTES_WRITTEN
How are they different from the HDFS_BYTES_READ and HDFS_BYTES_WRITTEN?
Thanks,
Abhishek
They represent the amount data written to the physical disk on the slaves, as
intermediate files before or during the shuffle phase. Where HDFS bytes are
the files written back into hdfs containing the data you wish to see.
J
On 2010-12-15, at 10:37 AM, abhishek sharma wrote:
Hi,
What do
On Wed, Dec 15, 2010 at 09:35, Steve Loughran ste...@apache.org wrote:
On 15/12/10 17:26, Konstantin Boudnik wrote:
Hey, commit rights won't give you a nice looking certificate, would it? ;)
Depends on what hudson says about the quality of your patches. I mean, if
every commit breaks the
If you would like MR-1938 patch (see link below), Ability for having user's
classes take precedence over the system classes for tasks' classpath, to
be included in CDH3b4 release, please put in a vote on
https://issues.cloudera.org/browse/DISTRO-64.
The details about the fix are here:
Hey Roger,
Thanks for the input. We're glad to see the community expressing their
priorities on our JIRA.
I noticed you also sent this to cdh-user, which is the more
appropriate list. CDH-specific discussion should be kept off the ASF
lists like common-user, which is meant for discussion about
Hi Roger,
Please use cloudera¹s mailing list for communications regarding cloudera
distributions.
Thanks
mahadev
On 12/15/10 10:43 AM, Roger Smith rogersmith1...@gmail.com wrote:
If you would like MR-1938 patch (see link below), Ability for having user's
classes take precedence over the
Got it.
On Wed, Dec 15, 2010 at 10:47 AM, Todd Lipcon t...@cloudera.com wrote:
Hey Roger,
Thanks for the input. We're glad to see the community expressing their
priorities on our JIRA.
I noticed you also sent this to cdh-user, which is the more
appropriate list. CDH-specific discussion
Apologies.
On Wed, Dec 15, 2010 at 10:48 AM, Mahadev Konar maha...@yahoo-inc.comwrote:
Hi Roger,
Please use cloudera¹s mailing list for communications regarding cloudera
distributions.
Thanks
mahadev
On 12/15/10 10:43 AM, Roger Smith rogersmith1...@gmail.com wrote:
If you would like
Actually, I just realized that numSplits can't be modified definitely. Even
if I write numSplits = 5, it's just a hint.
Then how come MultiFileInputFormat claims to use MultiFileSplit to contain one
file/split ?? or is that also just a hint?
Maha
On Dec 15, 2010, at 2:13 AM, maha wrote:
Hi
On Dec 15, 2010, at 2:13 AM, maha wrote:
Hi everyone,
Using Hadoop-0.20.2, I'm trying to use MultiFileInputFormat which is
supposed to put each file from the input directory in a SEPARATE split.
Is there some reason you don't just use normal InputFormat with an
extremely high
On Dec 15, 2010, at 9:26 AM, Konstantin Boudnik wrote:
Hey, commit rights won't give you a nice looking certificate, would it? ;)
Isn't that what Photoshop is for?
W. P.,
How are you running your Reducer? Is everything running in standalone mode
(all mappers/reducers in the same process as the launching application)? Or
are you running this in pseudo-distributed mode or on a remote cluster?
Depending on the application's configuration, log4j configuration
I'm running on a cluster. I'm trying to write to the log files on the
cluster machines, the ones that are visible through the jobtracker web
interface.
The log4j file I gave excerpts from is a central one for the cluster.
On Wed, Dec 15, 2010 at 1:38 PM, Aaron Kimball akimbal...@gmail.com
How is the central log4j file made available to the tasks? After you make
your changes to the configuration file, does it help if you restart the task
trackers?
You could also try setting the log level programmatically in your void
setup(Context) method:
@Override
protected void setup(Context
Hi Allen and thanks for responding ..
You're answer actually gave me another clue, I set numSplits = numFiles*100;
in myInputFormat and it worked :D ... Do you think there are side effects for
doing that?
Thank you,
Maha
On Dec 15, 2010, at 12:16 PM, Allen Wittenauer wrote:
Hi all,
I just want to know is it possible to allow an iterator to be repeatedly
reused?
Shen
HI ,
I am trying to upgrade hadoop ,as part of this i have set Two environment
variables NEW_HADOOP_INSTALL and OLD_HADOOP_INSTALL .
After this i have executed the following command %
NEW_HADOOP_INSTALL/bin/start-dfs -upgrade
But namenode didnot started as it was throwing
sandeep wrote:
HI ,
I am trying to upgrade hadoop ,as part of this i have set Two environment
variables NEW_HADOOP_INSTALL and OLD_HADOOP_INSTALL .
After this i have executed the following command %
NEW_HADOOP_INSTALL/bin/start-dfs -upgrade
But namenode didnot started as it
I totally obey the robots.txt since I am only fetching RSS feeds :-)
I implemented my crawler with HttpClient and it is working fine.
I often get messages about Cookie rejected, but am able to fetch news
articles anyway.
I guess the default java.net client is the stateful client you mentioned.
That clears the confusion. Thanks.
There are just too many tools for Hadoop :-)
2010/12/14 Alejandro Abdelnur t...@cloudera.com
Ed,
Actually Oozie is quite different from Cascading.
* Cascading allows you to write 'queries' using a Java API and they get
translated into MR jobs.
* Oozie
This one doesn't seem so complex for even a newbie like myself. Thanks!!!
2010/12/14 Ted Dunning tdunn...@maprtech.com
Or even simpler, try Azkaban: http://sna-projects.com/azkaban/
On Mon, Dec 13, 2010 at 9:26 PM, edward choi mp2...@gmail.com wrote:
Thanks for the tip. I took a look at
Thanks adarsh.
i have done the followign for NEW_HADOOP_INSTALL (new hadoop version
installation )i have set same values for dfs.name.dir and fs.checkpoint
which i have configured in OLD_HADOOP_INSTALL(old hadoop version
installation)
Now it is working
Thanks
sandeep
The first recommendation (gluing all my command line apps) is what I am
currently using.
The other ones you mentioned are just out of my league right now, since I am
quite new to Java world, not to mention JRuby, Groovy, Jython, etc.
But when I get comfortable with the environment and start to
Hi,
Does any one know how to speed up datanode decommissioning and
what are all the configurations
related to the decommissioning.
How to Speed Up Data Transfer from the Datanode getting
decommissioned.
Thanks Regards,
Sravan kumar.
sravankumar wrote:
Hi,
Does any one know how to speed up datanode decommissioning and
what are all the configurations
related to the decommissioning.
How to Speed Up Data Transfer from the Datanode getting
decommissioned.
Thanks Regards,
Sravan kumar.
You can use metasave to check the bottleneck of decommion speed,
If the bottleneck is the speed of namenode dispatch. You can tuning
dfs.max-repl-streams to a large number (default 2).
If there're many timeout block replication tasks from pending replication
queue to need replication , you can
37 matches
Mail list logo