Re: Using SPARQL against HBase

2010-04-04 Thread Amandeep Khurana
Edward, I think for now we'll start with modeling how to store triples such that we can run real time SPARQL queries on them and then later look at the Pregel model and how we can leverage that for bulk processing. The Bigtable data model doesnt lend itself directly to store triples such that fast

Re: Efficient mass deletes

2010-04-04 Thread Juhani Connolly
Currently it is just something I expect to run into problems with as I am yet some ways from going load testing though I'd hope to get started on it soon. The 0.21 planned implementation of MultiDelete will certainly help a lot though. Perhaps running a M/R job with a scan result as the input th

Re: Using SPARQL against HBase

2010-04-04 Thread Edward J. Yoon
Hi, I'm a proposer/sponsor of heart project. I have no doubt that RDF can be stored in HBase because google also stores linked-data in their bigtable. However, If you want to focus on large-scale (distributed) processing, I would recommend you to read google pregel project (google's graph computi

Re: Beginner question about querying records

2010-04-04 Thread Stack
2010/4/4 Onur AKTAS : > > Thank you very much for your answers.. I'm checking the document that you > gave. > In short words, unless massive traffic and massive data size and massive > scale is needed, stick with regular RDBMSs, then if we grow up to terabytes > of data to be querried, then we c

hbase mapreduce scan

2010-04-04 Thread Jürgen Jakobitsch
hi, i'm totally new to hbase and mapreduce and could really need some pointer into the right direction for the following situation. i managed to run a basic mapreduce example - analog to Export.java in the hbase.mapreduce package. what i need to achieve is the following : do a map/reduce scan

Re: Performance of reading rows with a large number of columns

2010-04-04 Thread Jonathan Gray
It's likely not the actual deserialization itself but rather the time to read the entire row from hdfs. There are some optimizations that can be made here (using block index to get all blocks for a row with a single hdfs read, tcp socket reuse, etc) On Apr 3, 2010, at 11:35 AM, "Sammy Yu"

RE: Beginner question about querying records

2010-04-04 Thread Onur AKTAS
Thank you very much for your answers.. I'm checking the document that you gave. In short words, unless massive traffic and massive data size and massive scale is needed, stick with regular RDBMSs, then if we grow up to terabytes of data to be querried, then we can switch no NO-SQL databases. Tha

Re: DFSClient errors during massive HBase load

2010-04-04 Thread Oded Rosen
Thanks, that seemed to help. Our jobs are running without failures, for the last 48 hours. On Thu, Apr 1, 2010 at 11:43 PM, Andrew Purtell wrote: > First, > > "ulimit: 1024" > > That's fatal. You need to up file descriptors to something like 32K. > > See http://wiki.apache.org/hadoop/Hbase/Trou