Mongo has the best out of box experience of anything, but can be limited in
terms of how far it will scale.
Hbase is a bit tricky to manage if you don't have expertise in managing
Hadoop.
Neither is a great idea if your data objects can be as large as 10MB.
On Wed, May 23, 2012 at 8:30 AM, Brend
No. 2.0.0 will not have the same level of ha as MapR. Specifically, the job
tracker hasn't been addressed and the name node Issues have only been partially
addressed.
On May 22, 2012, at 8:08 AM, Martinus Martinus wrote:
> Hi Todd,
>
> Thanks for your answer. Is that will have the same capa
Or just people who find your disks at the second-hand shop.
http://www.wavy.com/dpp/news/military/tricare-beneficiaries'-data-stolen
On Fri, Jan 20, 2012 at 3:36 PM, Tim Broberg wrote:
> I guess the first question is the threat model: What kind of bad guy are
> you trying to keep out? Is Ukrai
MapR provides this out of the box in a completely Hadoop compatible
environment.
Doing this with straight Hadoop involves a fair bit of baling wire.
On Tue, Jan 3, 2012 at 1:10 PM, alo alt wrote:
> Hi Mac,
>
> hdfs has at the moment no solution for an complete backup- and restore
> process like
Joey is speaking precisely, but in an intentionally very limited way.
Apache HDFS, the file system that comes with Apache Hadoop does not
support NFS.
On the other hand, maprfs which is a part of the commercial MapR
distribution which is based on Apache Hadoop does support NFS natively and
withou
It is a bit off topic, but maprfs is closely equivalent to HDFS except that
it provides the read-write and NFS semantics you are looking for.
Trying to shoe-horn HDFS into a job that it wasn't intended to do (i.e.
general file I/O) isn't a great idea. Better to use what it is good for.
On Mon, N
sizes are going bigger than MBs then it is
> not good to use Hbase for storage.
>
> ** **
>
> Any Comments ****
>
> ** **
>
> *From:* Ted Dunning [mailto:tdunn...@maprtech.com]
> *Sent:* Tuesday, November 22, 2011 11:43 AM
> *To:* hdfs-user@hadoop.apache.org
&g
How big is that?
On Mon, Nov 21, 2011 at 9:26 PM, Stuti Awasthi wrote:
> Hi Ted,
>
> Well in my case document size can be big, which is not good to keep in
> Hbase. So I rule out this option.
>
> ** **
>
> Thanks****
>
> ** **
>
> *From:* Ted D
HDFS is a filesystem that is designed to support map-reduce computation.
As such, the semantics differ from what SVN or GIT would want to have.
HBase provides versioned values. That might suffice for your needs.
On Mon, Nov 21, 2011 at 9:58 AM, Stuti Awasthi wrote:
> Do we have any support fr
ta, not 12GB. So
> about 1-in-72 such failures risks data loss, rather than 1-in-12. Which is
> still unacceptable, so use 3x replication! :-)
> --Matt
>
> On Mon, Nov 7, 2011 at 4:53 PM, Ted Dunning wrote:
>
>> 3x replication has two effects. One is reliability. Thi
e analysis for this usage, however.
On Tue, Nov 8, 2011 at 7:32 AM, Rita wrote:
> Thats a good point. What is hdfs is used as an archive? We dont really use
> it for mapreduce more for archival purposes.
>
>
> On Mon, Nov 7, 2011 at 7:53 PM, Ted Dunning wrote:
>
>> 3x re
By snapshots, I mean that you can freeze a copy of a portion of the the
file system for later use as a backup or reference. By mirror, I mean that
a snapshot can be transported to another location in the same cluster or to
another cluster and the mirrored image will be updated atomically to the
ne
x replication on a 500tb cluster. No issues
> whatsoever. 3x is for super paranoid.
>
>
> On Mon, Nov 7, 2011 at 5:06 PM, Ted Dunning wrote:
>
>> Depending on which distribution and what your data center power limits
>> are you may save a lot of money by going with machi
Depending on which distribution and what your data center power limits are
you may save a lot of money by going with machines that have 12 x 2 or 3 tb
drives. With suitable engineering margins and 3 x replication you can have
5 tb net data per node and 20 nodes per rack. If you want to go all cow
IDL's are nice, but old school systems like CORBA are death when you need to
change things.
Avro, protobufs and thrift are all miles better.
On Wed, Sep 21, 2011 at 1:59 PM, Koert Kuipers wrote:
> i would love an IDL, plus that modern serialization frameworks such as
> protobuf/thrift support v
2011/9/13 kang hua
> Hi Master:
> can you explain more detail --- "The only way to avoid this is to
> make the data much more cacheable and to have a viable cache coherency
> strategy. Cache coherency at the meta-data level is difficult. Cache
> coherency at the block level is also diffi
The namenode is already a serious bottleneck for meta-data updates. If you
allow some of the block map or meta-data to page out to disk, then the
bottleneck is going to get much worse.
The only way to avoid this is to make the data much more cacheable and to
have a viable cache coherency strategy
There is no way to do this for standard Apache Hadoop.
But other, otherwise Hadoop compatible, systems such as MapR do support this
operation.
Rather than push commercial systems on this mailing list, I would simply
recommend anybody who is curious to email me.
On Sat, Aug 27, 2011 at 12:07 PM,
Amen.
Without a solid hand-off, your system is going to be subject all kinds of
failure modes.
On Wed, Aug 17, 2011 at 11:17 AM, David Rosenstrauch wrote:
> You really need to employ *some* method to reliably determine when a file
> is successfully uploaded, or you're going to wind up with a ver
HDFS is not a normal file system. Instead highly optimized for running
map-reduce. As such, it uses replicated storage but imposes a write-once
model on files.
This probably makes it unsuitable as a primary storage for VM's.
What you need is either a conventional networked storage device or if
First, it is virtually impossible to create 100 million files in HDFS
because the name node can't hold that many.
Secondly, file creation is bottle-necked by the name node so the files that
you can create can't be created at more than about 1000 per second (and
achieving more than half that rate i
What version are you using?
On Thu, Apr 14, 2011 at 3:55 PM, Thanh Do wrote:
> Hi all,
>
> I have recently seen silent data loss in our system.
> Here is the case:
>
> 1. client appends to some block
> 2. for some reason, commitBlockSynchronization
> returns successfully with synclist = [] (
How large a cluster?
How large is each data-node? How much disk is devoted to hbase?
How does your HDFS data arrive? From one or a few machines in the cluster?
From outside the cluster?
On Thu, Mar 17, 2011 at 12:13 PM, Stuart Smith wrote:
> Parts of this may end up on the hbase list, but I
What do you mean by block? An HDFS chunk? Or a flushed write?
The answer depends a bit on which version of HDFS / Hadoop you are using.
With the append branches, things happen a lot more like what you expect.
Without that version, it is difficult to say what will happen.
Also, there are very
Take a look at http://opentsdb.net/ and see if it attacks your time series
problem in an interesting way for what you are doing.
Regarding your second comment, Zookeeper actually makes it easier to install
hbase because it stabilizes the interactions between different components.
There is also an
25 matches
Mail list logo