[image: Mic Drop]
Since there are millions of files (with sizes from 1mb to 15mb), I would
like to store them in a sequence file.  How do I store the location of each
of these files in HBase?

I see lots blogs and books talking about storing large files on HDFS and
storing file paths on HBase.  But, I don't see any real examples. I was
wondering if anybody implemented this in production.

Looking forward for reply from the community experts.  Thanks.

Regards,
Arun

On Sun, Feb 21, 2016 at 10:30 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> For #1, please take a look
> at
> hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
>
> e.g. the following methods:
>
>   public DFSInputStream open(String src) throws IOException {
>
>   public HdfsDataOutputStream append(final String src, final int
> buffersize,
>
>       EnumSet<CreateFlag> flag, final Progressable progress,
>
>       final FileSystem.Statistics statistics) throws IOException {
>
>
> Cheers
>
> On Wed, Feb 17, 2016 at 3:40 PM, Arun Patel <arunp.bigd...@gmail.com>
> wrote:
>
> > I would like to store large documents (over 100 MB) on HDFS and insert
> > metadata in HBase.
> >
> > 1) Users will use HBase REST API for PUT and GET requests for storing and
> > retrieving documents. In this case, how to PUT and GET documents to/from
> > HDFS?What are the recommended ways for storing and accessing document
> > to/from HDFS that provides optimum performance?
> >
> > Can you please share any sample code?  or a Github project?
> >
> > 2)  What are the performance issues I need to know?
> >
> > Regards,
> > Arun
> >
>

Reply via email to