Each bz2 file after merging is about 50Megs. The reducers take about 9
minutes.
Note: 'getmerge' is not an option. There isn't enough disk space to do a
getmerge on the local production box. Plus we need a scalable solution as
these files will get a lot bigger soon.
On Tue, Jul 30, 2013 at
Thanks, John. But I don't see an option to specify the # of output files.
How does Crush decide how many files to create? Is it only based on file
sizes?
On Wed, Jul 31, 2013 at 6:28 AM, John Meagher john.meag...@gmail.comwrote:
Here's a great tool for handling exactly that case:
So you are saying, we will first do a 'hadoop count' to get the total # of
bytes for all files. Let's say that comes to: 1538684305
Default Block Size is: 128M
So, total # of blocks needed: 1538684305 / 131072 = 11740
Max file blocks = 11740 / 50 (# of output files) = 234
Does this
Hi Aditya, it's likely that your ZK quorum has not formed correctly
ZooKeeperServer not running (at least 2 of the 3 servers need to be
talking). Try using the four letter words to verify (srvr).
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#sc_zkCommands
Check the zk server logs for
I opened a jira for tracking this issue:
https://issues.apache.org/jira/browse/HDFS-5046
2013/7/2 sam liu samliuhad...@gmail.com
Yes, the default replication factor is 3. However, in my case, it's
strange: during decommission hangs, I found some block's expected replicas
is 3, but the
Hi,
I think there is some block synchronization issue in your hdfs cluster.
Frankly i haven't face this issue yet.
I believe you need to refresh your namenode fsimage to make it up to date
with your datanodes.
Thanks.
On Wed, Jul 31, 2013 at 6:16 AM, ch huang justlo...@gmail.com wrote:
thanks
Hi
I think it is important to make Clare how does the replica is missing.
Here is an scenario: the disk of your datanode was broken down or the
replic was just deleted, so that the append failed.
Can you get similar log of your cluster?
发自我的 iPhone
在 2013-7-31,15:01,Jitendra Yadav
Hi,
Please give me solution
HTTP ERROR 403
Problem accessing /cmf/process/146/logs. Reason:
Server returned HTTP response code: 500 for URL:
http://venkat.ops.cloudwick.com:9000/process/146-SolrInit/files/logs/stderr.log
The server declined access to the page or resource.
Do we have to
Hi,
I am getting this error:
13/07/31 09:29:41 INFO mapred.JobClient: Task Id :
attempt_201307102216_0270_m_02_2, Status : FAILED
java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(StringTokenizer.java:332)
at
Here seems to be some problem in the mapper logic. You need to have the input
according to your code or need to update the code to handle the cases like
having the odd no of words in a line.
Before getting the element second time, need to check whether tokenizer has
more elements or not. If
Hi,
Thanks for responding.
How do I do that? (very new in java )
There are just two words per line..
One is word, second is integer.
Thanks
On Wed, Jul 31, 2013 at 11:20 AM, Devaraj k devara...@huawei.com wrote:
Here seems to be some problem in the mapper logic. You need to have the
input
If you want to write a mapreduce Job, you need to have basic knowledge on core
Java. You can get many resources in the internet for that.
If you face any problems related to Hadoop, you could ask here for help.
Thanks
Devaraj k
From: jamal sasha [mailto:jamalsha...@gmail.com]
Sent: 31 July
How many containers are you running per node?
On Jul 25, 2013, at 5:21 AM, Krishna Kishore Bonagiri write2kish...@gmail.com
wrote:
Hi Devaraj,
I used to run this application with the same number of containers
successfully on previous version, i.e. hadoop-2.0.4-alpha. Is it failing with
Hi,
I wanted to use the incrCounter API to generate auto increment ids but the
problem is that it doesn't return the incremented value. Does anyone know
why this API does not return the incremented value? And whether it would be
possible to change it to return the incremented value?
Thanks
As I said before, it is a per-file property and the config can be
bypassed by clients that do not read the configs, place a manual API
override, etc..
If you want to really define a hard maximum and catch such clients,
try setting dfs.replication.max to 2 at your NameNode.
On Thu, Aug 1, 2013 at
Hi Arun,
I was running on a single node cluster, so all my 100+ containers are on
single node. And, the problem is gone when I increased YARN_HEAP_SIZE to
2GB.
Thanks,
Kishore
On Thu, Aug 1, 2013 at 5:01 AM, Arun C Murthy a...@hortonworks.com wrote:
How many containers are you running per
16 matches
Mail list logo