Hi Bing,

You should expect HBase to be slower in the generic case:
1) it writes much more data (see hbase data model), with extra columns
qualifiers, timestamps & so on.
2) the data is written multiple times: once in the write-ahead-log, once
per replica on datanode & so on again.
3) there are inter process calls & inter machine calls on the critical path.

This is the cost of the atomicity, reliability and scalability features.
With these features in mind, HBase is reasonably fast to save data on a
cluster.

On your specific case (without the points 2 & 3 above), the performance
seems to be very bad.

You should first look at:
- how much is spent in the put vs. preparing the list
- do you have garbage collection going on? even swap?
- what's the size of your final Array vs. the available memory?

Cheers,

N.


On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lbl...@gmail.com> wrote:

> Dear all,
>
> By the way, my HBase is in the pseudo-distributed mode. Thanks!
>
> Best regards,
> Bing
>
> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lbl...@gmail.com> wrote:
>
> > Dear all,
> >
> > According to my experiences, it is very slow for HBase to save data? Am I
> > right?
> >
> > For example, today I need to save data in a HashMap to HBase. It took
> > about more than three hours. However when saving the same HashMap in a
> file
> > in the text format with the redirected System.out, it took only 4.5
> seconds!
> >
> > Why is HBase so slow? It is indexing?
> >
> > My code to save data in HBase is as follows. I think the code must be
> > correct.
> >
> >         ......
> >         public synchronized void
> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
> timingScale)
> >         {
> >                 List<Put> puts = new ArrayList<Put>();
> >
> >                 String hhNeighborRowKey;
> >                 Put hubKeyPut;
> >                 Put groupKeyPut;
> >                 Put topGroupKeyPut;
> >                 Put timingScalePut;
> >                 Put nodeKeyPut;
> >                 Put hubNeighborTypePut;
> >
> >                 for (Map.Entry<String, ConcurrentHashMap<String,
> > Set<String>>> sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet())
> >                 {
> >                         for (Map.Entry<String, Set<String>>
> > groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
> >                         {
> >                                 for (String neighborKey :
> > groupNeighborEntry.getValue())
> >                                 {
> >                                         hhNeighborRowKey =
> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
> > groupNeighborEntry.getKey() + timingScale + neighborKey);
> >
> >                                         hubKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
> > Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
> >                                         puts.add(hubKeyPut);
> >
> >                                         groupKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> > groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
> > Bytes.toBytes(groupNeighborEntry.getKey()));
> >                                         puts.add(groupKeyPut);
> >
> >                                         topGroupKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> >
> topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
> >
> Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey())));
> >                                         puts.add(topGroupKeyPut);
> >
> >                                         timingScalePut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> >
> timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
> > Bytes.toBytes(timingScale));
> >                                         puts.add(timingScalePut);
> >
> >                                         nodeKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> > nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
> > Bytes.toBytes(neighborKey));
> >                                         puts.add(nodeKeyPut);
> >
> >                                         hubNeighborTypePut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> >
> hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
> >                                         puts.add(hubNeighborTypePut);
> >                                 }
> >                         }
> >                 }
> >
> >                 try
> >                 {
> >                         this.neighborTable.put(puts);
> >                 }
> >                 catch (IOException e)
> >                 {
> >                         e.printStackTrace();
> >                 }
> >         }
> >         ......
> >
> > Thanks so much!
> >
> > Best regards,
> > Bing
> >
>

Reply via email to