Re: Schema design for filters

James Taylor Thu, 27 Jun 2013 18:57:09 -0700

Hi Kristoffer,
Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? You 
could model your schema much like an O/R mapper and issue SQL queries through 
Phoenix for your filtering.


James
@JamesPlusPlus
http://phoenix-hbase.blogspot.com

On Jun 27, 2013, at 4:39 PM, "Kristoffer Sjögren" <[email protected]> wrote:

> Thanks for your help Mike. Much appreciated.
> 
> I dont store rows/columns in JSON format. The schema is exactly that of a
> specific java class, where the rowkey is a unique object identifier with
> the class type encoded into it. Columns are the field names of the class
> and the values are that of the object instance.
> 
> Did think about coprocessors but the schema is discovered a runtime and I
> cant hard code it.
> 
> However, I still believe that filters might work. Had a look
> at SingleColumnValueFilter and this filter is be able to target specific
> column qualifiers with specific WritableByteArrayComparables.
> 
> But list comparators are still missing... So I guess the only way is to
> write these comparators?
> 
> Do you follow my reasoning? Will it work?
> 
> 
> 
> 
> On Fri, Jun 28, 2013 at 12:58 AM, Michael Segel
> <[email protected]>wrote:
> 
>> Ok...
>> 
>> If you want to do type checking and schema enforcement...
>> 
>> You will need to do this as a coprocessor.
>> 
>> The quick and dirty way... (Not recommended) would be to hard code the
>> schema in to the co-processor code.)
>> 
>> A better way... at start up, load up ZK to manage the set of known table
>> schemas which would be a map of column qualifier to data type.
>> (If JSON then you need to do a separate lookup to get the records schema)
>> 
>> Then a single java class that does the look up and then handles the known
>> data type comparators.
>> 
>> Does this make sense?
>> (Sorry, kinda was thinking this out as I typed the response. But it should
>> work )
>> 
>> At least it would be a design approach I would talk. YMMV
>> 
>> Having said that, I expect someone to say its a bad idea and that they
>> have a better solution.
>> 
>> HTH
>> 
>> -Mike
>> 
>> On Jun 27, 2013, at 5:13 PM, Kristoffer Sjögren <[email protected]> wrote:
>> 
>>> I see your point. Everything is just bytes.
>>> 
>>> However, the schema is known and every row is formatted according to this
>>> schema, although some columns may not exist, that is, no value exist for
>>> this property on this row.
>>> 
>>> So if im able to apply these "typed comparators" to the right cell values
>>> it may be possible? But I cant find a filter that target specific
>> columns?
>>> 
>>> Seems like all filters scan every column/qualifier and there is no way of
>>> knowing what column is currently being evaluated?
>>> 
>>> 
>>> On Thu, Jun 27, 2013 at 11:51 PM, Michael Segel
>>> <[email protected]>wrote:
>>> 
>>>> You have to remember that HBase doesn't enforce any sort of typing.
>>>> That's why this can be difficult.
>>>> 
>>>> You'd have to write a coprocessor to enforce a schema on a table.
>>>> Even then YMMV if you're writing JSON structures to a column because
>> while
>>>> the contents of the structures could be the same, the actual strings
>> could
>>>> differ.
>>>> 
>>>> HTH
>>>> 
>>>> -Mike
>>>> 
>>>> On Jun 27, 2013, at 4:41 PM, Kristoffer Sjögren <[email protected]>
>> wrote:
>>>> 
>>>>> I realize standard comparators cannot solve this.
>>>>> 
>>>>> However I do know the type of each column so writing custom list
>>>>> comparators for boolean, char, byte, short, int, long, float, double
>>>> seems
>>>>> quite straightforward.
>>>>> 
>>>>> Long arrays, for example, are stored as a byte array with 8 bytes per
>>>> item
>>>>> so a comparator might look like this.
>>>>> 
>>>>> public class LongsComparator extends WritableByteArrayComparable {
>>>>>  public int compareTo(byte[] value, int offset, int length) {
>>>>>      long[] values = BytesUtils.toLongs(value, offset, length);
>>>>>      for (long longValue : values) {
>>>>>          if (longValue == val) {
>>>>>              return 0;
>>>>>          }
>>>>>      }
>>>>>      return 1;
>>>>>  }
>>>>> }
>>>>> 
>>>>> public static long[] toLongs(byte[] value, int offset, int length) {
>>>>>  int num = (length - offset) / 8;
>>>>>  long[] values = new long[num];
>>>>>  for (int i = offset; i < num; i++) {
>>>>>      values[i] = getLong(value, i * 8);
>>>>>  }
>>>>>  return values;
>>>>> }
>>>>> 
>>>>> 
>>>>> Strings are similar but would require charset and length for each
>> string.
>>>>> 
>>>>> public class StringsComparator extends WritableByteArrayComparable  {
>>>>>  public int compareTo(byte[] value, int offset, int length) {
>>>>>      String[] values = BytesUtils.toStrings(value, offset, length);
>>>>>      for (String stringValue : values) {
>>>>>          if (val.equals(stringValue)) {
>>>>>              return 0;
>>>>>          }
>>>>>      }
>>>>>      return 1;
>>>>>  }
>>>>> }
>>>>> 
>>>>> public static String[] toStrings(byte[] value, int offset, int length)
>> {
>>>>>  ArrayList<String> values = new ArrayList<String>();
>>>>>  int idx = 0;
>>>>>  ByteBuffer buffer = ByteBuffer.wrap(value, offset, length);
>>>>>  while (idx < length) {
>>>>>      int size = buffer.getInt();
>>>>>      byte[] bytes = new byte[size];
>>>>>      buffer.get(bytes);
>>>>>      values.add(new String(bytes));
>>>>>      idx += 4 + size;
>>>>>  }
>>>>>  return values.toArray(new String[values.size()]);
>>>>> }
>>>>> 
>>>>> 
>>>>> Am I on the right track or maybe overlooking some implementation
>> details?
>>>>> Not really sure how to target each comparator to a specific column
>> value?
>>>>> 
>>>>> 
>>>>> On Thu, Jun 27, 2013 at 9:21 PM, Michael Segel <
>>>> [email protected]>wrote:
>>>>> 
>>>>>> Not an easy task.
>>>>>> 
>>>>>> You first need to determine how you want to store the data within a
>>>> column
>>>>>> and/or apply a type constraint to a column.
>>>>>> 
>>>>>> Even if you use JSON records to store your data within a column, does
>> an
>>>>>> equality comparator exist? If not, you would have to write one.
>>>>>> (I kinda think that one may already exist...)
>>>>>> 
>>>>>> 
>>>>>> On Jun 27, 2013, at 12:59 PM, Kristoffer Sjögren <[email protected]>
>>>> wrote:
>>>>>> 
>>>>>>> Hi
>>>>>>> 
>>>>>>> Working with the standard filtering mechanism to scan rows that have
>>>>>>> columns matching certain criterias.
>>>>>>> 
>>>>>>> There are columns of numeric (integer and decimal) and string types.
>>>>>> These
>>>>>>> columns are single or multi-valued like "1", "2", "1,2,3", "a", "b"
>> or
>>>>>>> "a,b,c" - not sure what the separator would be in the case of list
>>>> types.
>>>>>>> Maybe none?
>>>>>>> 
>>>>>>> I would like to compose the following queries to filter out rows that
>>>>>> does
>>>>>>> not match.
>>>>>>> 
>>>>>>> - contains(String column, String value)
>>>>>>> Single valued column that String.contain() provided value.
>>>>>>> 
>>>>>>> - equal(String column, Object value)
>>>>>>> Single valued column that Object.equals() provided value.
>>>>>>> Value is either string or numeric type.
>>>>>>> 
>>>>>>> - greaterThan(String column, java.lang.Number value)
>>>>>>> Single valued column that > provided numeric value.
>>>>>>> 
>>>>>>> - in(String column, Object value...)
>>>>>>> Multi-valued column have values that Object.equals() all provided
>>>>>> values.
>>>>>>> Values are of string or numeric type.
>>>>>>> 
>>>>>>> How would I design a schema that can take advantage of the already
>>>>>> existing
>>>>>>> filters and comparators to accomplish this?
>>>>>>> 
>>>>>>> Already looked at the string and binary comparators but fail to see
>> how
>>>>>> to
>>>>>>> solve this in a clean way for multi-valued column values.
>>>>>>> 
>>>>>>> Im aware of custom filters but would like to avoid it if possible.
>>>>>>> 
>>>>>>> Cheers,
>>>>>>> -Kristoffer
>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>>

Re: Schema design for filters

Reply via email to