Hi Kristoffer, Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? You could model your schema much like an O/R mapper and issue SQL queries through Phoenix for your filtering.
James @JamesPlusPlus http://phoenix-hbase.blogspot.com On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[email protected]> wrote: > Thanks for your help Mike. Much appreciated. > > I dont store rows/columns in JSON format. The schema is exactly that of a > specific java class, where the rowkey is a unique object identifier with > the class type encoded into it. Columns are the field names of the class > and the values are that of the object instance. > > Did think about coprocessors but the schema is discovered a runtime and I > cant hard code it. > > However, I still believe that filters might work. Had a look > at SingleColumnValueFilter and this filter is be able to target specific > column qualifiers with specific WritableByteArrayComparables. > > But list comparators are still missing... So I guess the only way is to > write these comparators? > > Do you follow my reasoning? Will it work? > > > > > On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel > <[email protected]>wrote: > >> Ok... >> >> If you want to do type checking and schema enforcement... >> >> You will need to do this as a coprocessor. >> >> The quick and dirty way... (Not recommended) would be to hard code the >> schema in to the co-processor code.) >> >> A better way... at start up, load up ZK to manage the set of known table >> schemas which would be a map of column qualifier to data type. >> (If JSON then you need to do a separate lookup to get the records schema) >> >> Then a single java class that does the look up and then handles the known >> data type comparators. >> >> Does this make sense? >> (Sorry, kinda was thinking this out as I typed the response. But it should >> work ) >> >> At least it would be a design approach I would talk. YMMV >> >> Having said that, I expect someone to say its a bad idea and that they >> have a better solution. >> >> HTH >> >> -Mike >> >> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <[email protected]> wrote: >> >>> I see your point. Everything is just bytes. >>> >>> However, the schema is known and every row is formatted according to this >>> schema, although some columns may not exist, that is, no value exist for >>> this property on this row. >>> >>> So if im able to apply these "typed comparators" to the right cell values >>> it may be possible? But I cant find a filter that target specific >> columns? >>> >>> Seems like all filters scan every column/qualifier and there is no way of >>> knowing what column is currently being evaluated? >>> >>> >>> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel >>> <[email protected]>wrote: >>> >>>> You have to remember that HBase doesn't enforce any sort of typing. >>>> That's why this can be difficult. >>>> >>>> You'd have to write a coprocessor to enforce a schema on a table. >>>> Even then YMMV if you're writing JSON structures to a column because >> while >>>> the contents of the structures could be the same, the actual strings >> could >>>> differ. >>>> >>>> HTH >>>> >>>> -Mike >>>> >>>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <[email protected]> >> wrote: >>>> >>>>> I realize standard comparators cannot solve this. >>>>> >>>>> However I do know the type of each column so writing custom list >>>>> comparators for boolean, char, byte, short, int, long, float, double >>>> seems >>>>> quite straightforward. >>>>> >>>>> Long arrays, for example, are stored as a byte array with 8 bytes per >>>> item >>>>> so a comparator might look like this. >>>>> >>>>> public class LongsComparator extends WritableByteArrayComparable { >>>>> public int compareTo(byte[] value, int offset, int length) { >>>>> long[] values = BytesUtils.toLongs(value, offset, length); >>>>> for (long longValue : values) { >>>>> if (longValue == val) { >>>>> return 0; >>>>> } >>>>> } >>>>> return 1; >>>>> } >>>>> } >>>>> >>>>> public static long[] toLongs(byte[] value, int offset, int length) { >>>>> int num = (length - offset) / 8; >>>>> long[] values = new long[num]; >>>>> for (int i = offset; i < num; i++) { >>>>> values[i] = getLong(value, i * 8); >>>>> } >>>>> return values; >>>>> } >>>>> >>>>> >>>>> Strings are similar but would require charset and length for each >> string. >>>>> >>>>> public class StringsComparator extends WritableByteArrayComparable { >>>>> public int compareTo(byte[] value, int offset, int length) { >>>>> String[] values = BytesUtils.toStrings(value, offset, length); >>>>> for (String stringValue : values) { >>>>> if (val.equals(stringValue)) { >>>>> return 0; >>>>> } >>>>> } >>>>> return 1; >>>>> } >>>>> } >>>>> >>>>> public static String[] toStrings(byte[] value, int offset, int length) >> { >>>>> ArrayList<String> values = new ArrayList<String>(); >>>>> int idx = 0; >>>>> ByteBuffer buffer = ByteBuffer.wrap(value, offset, length); >>>>> while (idx < length) { >>>>> int size = buffer.getInt(); >>>>> byte[] bytes = new byte[size]; >>>>> buffer.get(bytes); >>>>> values.add(new String(bytes)); >>>>> idx += 4 + size; >>>>> } >>>>> return values.toArray(new String[values.size()]); >>>>> } >>>>> >>>>> >>>>> Am I on the right track or maybe overlooking some implementation >> details? >>>>> Not really sure how to target each comparator to a specific column >> value? >>>>> >>>>> >>>>> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel < >>>> [email protected]>wrote: >>>>> >>>>>> Not an easy task. >>>>>> >>>>>> You first need to determine how you want to store the data within a >>>> column >>>>>> and/or apply a type constraint to a column. >>>>>> >>>>>> Even if you use JSON records to store your data within a column, does >> an >>>>>> equality comparator exist? If not, you would have to write one. >>>>>> (I kinda think that one may already exist...) >>>>>> >>>>>> >>>>>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <[email protected]> >>>> wrote: >>>>>> >>>>>>> Hi >>>>>>> >>>>>>> Working with the standard filtering mechanism to scan rows that have >>>>>>> columns matching certain criterias. >>>>>>> >>>>>>> There are columns of numeric (integer and decimal) and string types. >>>>>> These >>>>>>> columns are single or multi-valued like "1", "2", "1,2,3", "a", "b" >> or >>>>>>> "a,b,c" - not sure what the separator would be in the case of list >>>> types. >>>>>>> Maybe none? >>>>>>> >>>>>>> I would like to compose the following queries to filter out rows that >>>>>> does >>>>>>> not match. >>>>>>> >>>>>>> - contains(String column, String value) >>>>>>> Single valued column that String.contain() provided value. >>>>>>> >>>>>>> - equal(String column, Object value) >>>>>>> Single valued column that Object.equals() provided value. >>>>>>> Value is either string or numeric type. >>>>>>> >>>>>>> - greaterThan(String column, java.lang.Number value) >>>>>>> Single valued column that > provided numeric value. >>>>>>> >>>>>>> - in(String column, Object value...) >>>>>>> Multi-valued column have values that Object.equals() all provided >>>>>> values. >>>>>>> Values are of string or numeric type. >>>>>>> >>>>>>> How would I design a schema that can take advantage of the already >>>>>> existing >>>>>>> filters and comparators to accomplish this? >>>>>>> >>>>>>> Already looked at the string and binary comparators but fail to see >> how >>>>>> to >>>>>>> solve this in a clean way for multi-valued column values. >>>>>>> >>>>>>> Im aware of custom filters but would like to avoid it if possible. >>>>>>> >>>>>>> Cheers, >>>>>>> -Kristoffer >>>>>> >>>>>> >>>> >>>> >> >>
