Newbie question: Rowkey design

2013-12-16 Thread Wilm Schumacher
Hi, I'm a newbie to hbase and have a question on the rowkey design and I hope this question isn't to newbie-like for this list. I have a question which cannot be answered by knoledge of code but by experience with large databases, thus this mail. For the sake of explaination I create a small exam

Re: Newbie question: Rowkey design

2013-12-17 Thread Wilm Schumacher
read evenly across the nodes; if >> you design row key such that the records are spread evenly across the >> nodes, maybe it's not convenient to query or impossible to get the record >> through row key directly (say you have a random number as the row key's >&g

Re: Thrift getRows API with column filtering?

2014-11-22 Thread Wilm Schumacher
Hi, 1.) the plan with minimizing the number of column families is very wise 2.) which API you are using? thrift or thrift2? The original (or old) thrift api does not seem to support your plan. However, it is supported. See e.g. http://hbase.apache.org/book/thrift.html the thrift2-api seems to

Re: HBase entity relationship

2014-11-24 Thread Wilm Schumacher
Hi, perhaps I'm wrong, but this sounds a little bit "sql-ish" to me, relations by ids etc.. Is there a hierarchy in the data? Is A some sort of "container" for B? Or are the connections arbitrary? Could your make an example where this relation of classes A and B fit or make a more verbose explan

Re: HBase entity relationship

2014-11-25 Thread Wilm Schumacher
Hi, thx for the example. That makes it more easy to consider some options. In my opinion you have 3 basic options. 1.) leading source I As I assumed, "source" seems to be the leading concept. Every "job" has to have a "source". So you could pack the "jobs" in the "source" So you could make a c

Re: HBase entity relationship

2014-11-26 Thread Wilm Schumacher
Am 26.11.2014 um 08:05 schrieb jatinpreet: > I am curious to know if a hybrid of approaches 2 and 3 could be used. This > means having the rowkeys of jobs inside source row like in approach 2. And > having the parent source rowkey as a column in job row as in approach 3. If you just use "option

Re: Newbie Question about 37TB binary storage on HBase

2014-11-27 Thread Wilm Schumacher
Hi Aleks ;), Am 27.11.2014 um 22:27 schrieb Aleks Laz: > Our application is a nginx/php-fpm/postgresql Setup. > The target design is nginx + proxy features / php-fpm / $DB / $Storage. > > .) Can I mix HDFS /HBase for binary data storage and data analyzing? yes. hbase is perfect for that. For stor

Re: Newbie Question about 37TB binary storage on HBase

2014-11-27 Thread Wilm Schumacher
Am 28.11.2014 um 00:32 schrieb Aleks Laz: > What's the plan about the "MOB-extension"? https://issues.apache.org/jira/browse/HBASE-11339 > From development point of view I can build HBase with the "MOB-extension" > but from sysadmin point of view a 'package' (jar,zip, dep, rpm, ...) is > much > ea

Re: Newbie Question about 37TB binary storage on HBase

2014-11-28 Thread Wilm Schumacher
input and ideas. >> >> I will now step back and learn more about big data and big storage to >> be able to talk further. >> >> Cheers Aleks >> >> Am 28-11-2014 01:20, schrieb Wilm Schumacher: >> >> Am 28.11.2014 um 00:32 schrieb Aleks Laz: >>&g

Re: Cannot connect to Hbase via Java API

2014-12-17 Thread Wilm Schumacher
Could you please post the /etc/hosts ./conf/hbase-site.conf ./conf/regionservers ./log/hbase*regionsserver.log ? The error says, that your regionserver is not running (or something happend with the server). This could mean, that a) the regionserver never started b) the regionserver died c) the r

Re: Cannot connect to Hbase via Java API

2014-12-17 Thread Wilm Schumacher
d. > I'm using the HortonWorksStack and have a single machine, on which > runs the complete stack (no cluster). > > Hbase shell, Hive and Apache Phoenix works fine. > > BR Marco > > 2014-12-17 11:41 GMT+01:00 Wilm Schumacher : >> Could you please post the >>

Re: Cannot connect to Hbase via Java API

2014-12-17 Thread Wilm Schumacher
Am 17.12.2014 um 14:29 schrieb Marco: > Hi Wilm, > > the regionservers only contain the hostname. sounds good. > And yes, I've changed the > namehostname is consistent. > > ipV6 I could try but after searching for it, the issue seem to be > different (other excption). > > Also, I've installed

What's the best way to reduce to map or array?

2014-12-17 Thread Wilm Schumacher
Hi, if I can guarantee that the size of the array/map is reasonable small, what would be the best way to reduce to an array, arrayList, map or somthing like that? Something like TableReducer, but for an object in the memory of the main thread. I thought of using a TableReducer and scanning the r

Re: Cannot connect to Hbase via Java API

2014-12-17 Thread Wilm Schumacher
Am 17.12.2014 um 15:27 schrieb Marco: >> Tonight I will take a closer look at the sandbox (never >> used it before). Perhaps I'll find something. But there are some GB to >> download ;). > That would be cool :) Thx. *mumble curses* I cannot import the Hortenwork VM. VM-Import says the image is cor

Re: Cannot connect to Hbase via Java API

2014-12-18 Thread Wilm Schumacher
Hi, I just took a look into the hdp 2.2 sandbox, and unfortunately it was a waste of time and I went older ;). At the first boot, without me doing anything in the configs, zookeeper throwed errors at startup and got killed (couldn't connect). However, ignoring this I started hbase, which hdp reco

Re: Design a datastore maintaining historical view of users.

2015-01-12 Thread Wilm Schumacher
Hi, I'm doing something comparable right now, but not with such a HUGE database O_o. 10 Mio results for such a query? This would mean that you have 100 Mio -> 1 Billion customers ?!?! However: in my opinion with such a huge database HBase is a good fit. However, your data model should be changed

Re: Show last 10 (100/1000) events

2015-01-14 Thread Wilm Schumacher
Will the number of "last" will be much larger than 10 (100/1000)? If not, then I wouldn't bother with a real database after all and would hold the data in RAM. Either: * in an object in your "gateway" to hbase. E.g. simple java Array list in your java server which serves the api to the web server

general question about datamodel => empty columns

2015-01-16 Thread Wilm Schumacher
Hi, I run into a problem , which I encounter several times by now and perhaps you can help me. What should I include in tables where just the qualifier is needed? E.g. in indexing you have to make the reference of the index either by columns, or by rows in the index table. But in this way, there

Re: http://stackoverflow.com/questions/28350940/cannot-start-standalone-instance-of-hbase

2015-02-06 Thread Wilm Schumacher
Hi, I'll try to help. First: you oviously try to run it on windows. Thus you need cygwin ( http://hbase.apache.org/cygwin.html ) You are trying to use the "git shell". I honestly never heard of that! What is that? Nevertheless: a shell normally comes with some programs which makes the shell use

Re: http://stackoverflow.com/questions/28350940/cannot-start-standalone-instance-of-hbase

2015-02-06 Thread Wilm Schumacher
Dammit, Brian was faster ;). However, I'd REALLY REALLY recommand to not use windows for running and testing hbase. Of course hbase is suitable to run on windows, but a real setup (cluster, kerberos, custom thrift ... all the fancy stuff) will be very ... challenging! The ssh configuration must b

Re: Fwd: data base design question

2015-02-12 Thread Wilm Schumacher
Am 12.02.2015 um 21:12 schrieb Dima Spivak: > Is there a better alternative than above options for one to many > relationships? you could use a column family in table 2 for that. table 1 result1 data:foo => bar result2 data:foo => baz result3 data:foo => bar result4 data:foo => baz table 2

Re: Fwd: data base design question

2015-02-12 Thread Wilm Schumacher
> > To get all results for an order, do a scan with startrow/endrow or a > use a prefix filter with order_id as the prefix. > > Alok > > > On Thu, Feb 12, 2015 at 1:23 PM, Wilm Schumacher > wrote: >> Am 12.02.2015 um 21:12 schrieb Dima Spivak: >>> Is th

Re: Fwd: data base design question

2015-02-13 Thread Wilm Schumacher
Hi, Am 13.02.2015 um 04:08 schrieb Jignesh Patel: > How about Option 1: Create an embedded entity of results and store it as > list object inside order table as one of the column field. the problem is, that a hbase cell value must be a byte array. Thus you have to convert the "list object" to a by

Re: Streaming data to htable

2015-02-13 Thread Wilm Schumacher
Am 13.02.2015 um 10:39 schrieb Sleiman Jneidi: > I would go with second option, HtableInterface.put(List). The first > option sounds dodgy, where 5 minutes is a good time for things to go wrong > and you lose your data I agree with Sleiman. In my opinion the "multi put" option is the best plan. T

hbase as logging dump => design for mapred

2015-02-13 Thread Wilm Schumacher
Hi, I have a design question and I'm kind of stuck. I do not find an easy solution, but I think there is one. The problem: consider you have an application where users can "open" an object. And then they can make an operation on that object. Or go further to another object. And now I want to make

Re: Standalone == Dev Only?

2015-03-06 Thread Wilm Schumacher
Hi, Am 06.03.2015 um 19:18 schrieb Stack: > Why not use an RDBMS then? When I first read the hbase documentation I also stumbled about the "only use for large datasets" or "standalone only in dev mode" etc. In my point of view there are some arguments against RDBMSs and for e.g. hbase, although w

Re: Regarding a doubt I am having for HBase

2015-03-12 Thread Wilm Schumacher
Hi, I would like to add a question: Why do you need the ID in the first place? The hash seems to be generated by another source, thus is imutable. But is this true for the ID, too? If not, why not using only the hash? Best wishes, Wilm Am 10.03.2015 um 21:40 schrieb Alex Baranau: > CCing HBase'

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Wilm Schumacher
Hi, a cross post from the dev list. perhaps here more people have valuable hints or ideas. Am 16.03.2015 um 18:46 schrieb Rose, Joseph: > Alright, let’s see if I can get this discussion back on track. > > I have a sensibly defined table for patient data; its rowkey is simply > lastname:firstname,

Re: Status of Huawei's 2' Indexing?

2015-03-16 Thread Wilm Schumacher
damit. Sry for double post. I forgot something. Am 16.03.2015 um 19:37 schrieb Wilm Schumacher: > * First ... MacGyver your own index. > > That's not that complicate as it sounds. A very easy idea would be the > update within the CRUD operations on your data. Within a >

Re: What is the best database to handle large volume of data

2014-05-23 Thread Wilm Schumacher
Hi, your question is very general and hard to answer regarding the lack of essential information. However, based on my assumption on what you are trying to do I would recommend cassandra and materialized views for your portal (if the questions are pre-computable) and indices (if the questions are

hbase and hadoop (for "normal" hdfs) cluster together?

2014-07-31 Thread Wilm Schumacher
Hi, I have a "conceptional" question and would appreciate hints. My task is to save files to hdfs and to maintain some informations about them in a hbase db and then serve both to the application. Per file I have around 50 rows with 10 columns (in 2 column families) in the tables, which have str

Re: hbase and hadoop (for "normal" hdfs) cluster together?

2014-07-31 Thread Wilm Schumacher
Am 31.07.2014 um 18:08 schrieb Ted Yu: > What's the read / write mix in your workload ? I would think around 1 put to 2-5 reads for the "hdfs files" (estimated) and 1 put to hundreds of reads in the hbase table So in short form: = for the files * number of puts ~ gets * "small" number of put

Re: hbase and hadoop (for "normal" hdfs) cluster together?

2014-07-31 Thread Wilm Schumacher
Hi, Am 31.07.2014 um 20:28 schrieb Nick Dimiduk: > What else will this cluster do? Are you planning to run MR against the data > here? The cluster does nothing else than this application. The application is the "hdfs part", and the "hbase part". And yes, I plan to run some MR jobs against the dat

hbase attack scenarios?

2014-08-05 Thread Wilm Schumacher
Hi, sry for asking a fundamental newbie question again :/. But after coding some applications with using hbase I want to reconsider the security. Especially after today some (i.e. billions) e-mail addresses and hashes are stolen. So, my question is: what are the most prominent and general attack

Re: hbase attack scenarios?

2014-08-06 Thread Wilm Schumacher
Am 06.08.2014 um 19:07 schrieb Andrew Purtell: > We have no known vulnerabilities that equate to a SQL injection attack > vulnerability. However, as Esteban says you'd want to treat HBase like any > other datastore underpinning a production service and out of an abundance > of caution deploy it i

Re: Nested data structures examples for HBase

2014-09-09 Thread Wilm Schumacher
as stated above you can use JSON or something similar, which is always possible. However, if you have to do that very often (and I think you are, if you using hbase ;) ), this could be a bad plan, because parsing JSON is expensive in terms of CPU. As I am relativly new to hbase (using it perhaps f

Re: Nested data structures examples for HBase

2014-09-10 Thread Wilm Schumacher
Am 10.09.2014 um 17:33 schrieb Michael Segel: > Because you really don’t want to do that since you need to keep the number of > CFs low. in my example the number of CFs is 1. So this is not a problem. Best wishes, Wilm

Re: Nested data structures examples for HBase

2014-09-10 Thread Wilm Schumacher
Am 10.09.2014 um 22:25 schrieb Michael Segel: > Ok, but here’s the thing… you extrapolate the design out… each column > with a subordinate record will get its own CF. I disagree. Not by the proposed design. You could do it with one CF. > Simple examples can go > very bad when you move to real li

Re: A use case for ttl deletion?

2014-09-26 Thread Wilm Schumacher
I wrote a cookie store for node.js using hbase. By this method the sessions are deleted "regularly" after a specific time nothing happens on a specific session. Am 26.09.2014 um 17:20 schrieb yonghu: > Hello, > > Can anyone give me a concrete use case for ttl deletions? I mean in which > situatio

Re: A use case for ttl deletion?

2014-09-26 Thread Wilm Schumacher
Hi, your mail got me thinking about a general answer. I think a good answer would be: all data that are only usefull for a specific time AND are possibly generated infinitely for a finite number of users should have a ttl. OR when the space is very small compared to the number of users. An examp

Re: Archive Files

2014-10-18 Thread Wilm Schumacher
Am 18.10.2014 um 12:35 schrieb Ravindranath Akila: > Is there any approach HBASE can store archive like rarely used files on > cheap storage? hadoop directly is equiped for that. There are HAR files, map files and sequence files. If I understand correctly, sequence files is what you are searchin