Hi Rob,
DFSInputStream: InterfaceAudience for this class is private and you should
not use this class directly. This class mainly implements actual core
functionality of read. And this is DFS specific implementation only.
HdfsDataInputStream : InterfaceAudience for this class is public and you
Hi,
I am using Hadoop 1.0.2. I have written a map reduce job. I have a requirement
to process the whole file without splitting. So I have written a new input
format to process the file as a whole by overriding the isSplittable() method.
I have also created a new Record reader implementation to
Hi
What is the use case difference between:
- DFSInputStream and HdfsDataInputStream
- DFSOutputStream and HdfsDataOutputStream
When one should be preferred over other? From sources I see they have
similar functionality, only HdfsData*Stream "follows" Data*Stream instead
of *Stream. Also is DFS*S
I am also thinking about this for my current project, so here I share some of
my thoughts, but maybe some of them are not correct.
1) In my previous projects years ago, we store a lot of data as plain text, as
at that time, people thinks the Big data can store all the data, no need to
worry abou
Not exactly know what you are trying to do, but it seems like the memory is
your bottle neck, and you think you have enough CPU resource, so you want to
use multi-thread to utilize CPU resources?
You can start multi-threads in your mapper, as if you think your mapper logic
is very cpu intensive
Hi Himanshu,
Changing the ratio is definitely a reasonable thing to do. The capacities
come from the mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
You can tweak these on your nodes to get your desired ratio.
-Sandy
On Mon, Sep 30,
Hi,
Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map Task
Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a ratio of
2.7. We have a lot of variety of jobs running and we want to increase the
throughput.
My manual observation was that we hit the Mapper capacit
Sequence files are language neutral as Avro. Yes , but not sure about the
support of other language lib for processing seq files.
Thanks,
Rahul
On Mon, Sep 30, 2013 at 11:10 PM, Peyman Mohajerian wrote:
> It is not recommended to keep the data at rest in sequences format,
> because it is Java
Is there a build.xml available to use fault injection with Hadoop as this
tutorial says?
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html#Aspect_Example
I cannot find the jar file for org.apache.hadoop.fi.ProbabilityModel and
org.apache.hadoop.hdfs.se
It is not recommended to keep the data at rest in sequences format, because
it is Java specific and you cannot share it with other none-java systems
easily, it is ideal for running map/reduce jobs. On approach would be to
bring all the data of different formats in HDFS as is and then convert them
t
Hi Omkar,
I have a distributed application that I am trying to port to YARN. My
application does many things in multiple threads in parallel, and those
threads in turn run some executables (how many of them depends on some
business logic and is variable). Now I am trying to launch those
executabl
I would like to add new machines to my existing cluster but they won't be
similar to the current nodes. I have to scenarios I'm thinking of:
1. What are the implications (besides initial load balancing) of adding a
new node to the cluster, if this node runs on a machine similar to all
other nodes
Thanks for your suggestions and replies.
I am still confused about this:
To create the list of tasks to run, the job scheduler first retrieves the input
splits computed by the JobClient from the shared filesystem (step 6).
My question:
Does the input split in the above statement refer to the p
I do not think these are same issue, Please correct me if I am worng.
the SO link is abour SNN unable to establish communication with NN.
In my case I am unable to launch NN itself.
The NLP issue is at the highlighted line, but I am not sure how to go about
resolving it
/** Add a node child to
for xml files processing hadoop comes with a class for this purpose called
StreamXmlRecordReader,You can use it by setting your input format to
StreamInputFormat and setting the
stream.recordreader.class property to
org.apache.hadoop.streaming.StreamXmlRecordReader.
for Json files, an open-source
I use CDH-4.3.1, When I start datanode, there are below error:
2013-09-26 17:57:07,803 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at
0.0.0.0:40075
2013-09-26 17:57:07,814 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2013-09-26 17:
Hi,
http://stackoverflow.com/questions/5490805/hadoop-nullpointerexcep
try this link
Thanks
Manoj
On Mon, Sep 30, 2013 at 1:03 PM, Ravi Shetye wrote:
> Can some one please help me about how I go ahead debugging the issue.The
> NN log has the following error stack
>
> 2013-09-30 07:28:42,768 I
sorry, just trying to cancel my mail
Hello,
the file format topic is still confusing me and I would appreciate if you
could share your thoughts and experience with me.
>From reading different books/articles/websites I understand that
- Sequence files (used frequently but not only for binary data),
- AVRO,
- RC (was developed to work
Can some one please help me about how I go ahead debugging the issue.The NN
log has the following error stack
2013-09-30 07:28:42,768 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system
started
2013-09-30 07:28:42,967 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAd
20 matches
Mail list logo