I believe the typical case would be to set it at the scan and major compaction scopes for the table. This would ensure that queries for data would see the transformed result and, eventually, all of the data would be rewritten to the new schema (or you could force a major compaction and know definitively).

Also, since it hasn't been otherwise stated, using the TransformingIterator is on the fringes of "normal". Your life may be much more simple to write a mapreduce job to rewrite your data. Implementing the Iterator correctly is a little obtuse (as you're noticing) and is not at all straightforward to debug. If it's reasonable to rewrite your data, it may be the easier solution IMO.

madhvi wrote:
Hi All,

If anyone has worked on tranforming iterator can tell me if the iterator
make tranformed changes in the accumulo table also or it returns the
result at the scan time only. Can u provide me details how to implement
its abstract methods and their use and workflow of the iterator?

Thanks
Madhvi
On Wednesday 27 May 2015 05:38 PM, Andrew Wells wrote:
to implement that iterator.

looks like you will only need to override replaceColumnFamily

and this looks to return the new ColumnFamily via the argument. So
manipulate the Text object provided.

On Wed, May 27, 2015 at 8:06 AM, Andrew Wells <awe...@clearedgeit.com
<mailto:awe...@clearedgeit.com>> wrote:

    Looks like you want to override these methods:

    |protected Key
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>|
        |*replaceColumnFamily
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnFamily%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29>*(Key
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html> 
originalKey,
    org.apache.hadoop.io.Text newColFam)|
              Make a new key with all parts (including delete flag)
    coming from |originalKey| but use |newColFam| as the column family.
    |protected Key
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>|
        |*replaceColumnQualifier
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnQualifier%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29>*(Key
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html> 
originalKey,
    org.apache.hadoop.io.Text newColQual)|
              Make a new key with all parts (including delete flag)
    coming from |originalKey| but use |newColQual| as the column
    qualifier.
    |protected Key
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>|
        |*replaceColumnVisibility
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceColumnVisibility%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text%29>*(Key
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html> 
originalKey,
    org.apache.hadoop.io.Text newColVis)|
              Make a new key with all parts (including delete flag)
    coming from |originalKey| but use |newColVis| as the column
    visibility.
    |protected Key
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>|
        |*replaceKeyParts
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceKeyParts%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text%29>*(Key
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html> 
originalKey,
    org.apache.hadoop.io.Text newColQual,
    org.apache.hadoop.io.Text newColVis)|
              Make a new key with a column qualifier, and column
    visibility.
    |protected Key
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html>|
        |*replaceKeyParts
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/iterators/user/TransformingIterator.html#replaceKeyParts%28org.apache.accumulo.core.data.Key,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text,%20org.apache.hadoop.io.Text%29>*(Key
    
<http://accumulo.apache.org/1.6/apidocs/org/apache/accumulo/core/data/Key.html> 
originalKey,
    org.apache.hadoop.io.Text newColFam,
    org.apache.hadoop.io.Text newColQual,
    org.apache.hadoop.io.Text newColVis)|
              Make a new key with a column family, column qualifier,
    and column visibility.





    On Wed, May 27, 2015 at 7:40 AM, shweta.agrawal
    <shweta.agra...@orkash.com <mailto:shweta.agra...@orkash.com>> wrote:

        Thanks for all the suggestion.

        I read about TransformingIterator and started implementing
        it,  I extended this class and tried to override its abstract
        method. But I am not able to get where and what to write to
        change column family?

        So please provide your suggestions.

        Thanks
        Shweta



        On Tuesday 26 May 2015 08:33 PM, Adam Fuchs wrote:
        This can also be done with a row-doesn't-fit-into-memory
        constraint. You won't need to hold the second column
        in-memory if your iterator tree deep copies, filters,
        transforms and merges. Exhibit A:

        [HeapIterator-derivative]
           |_________________________
           |                         \
        [transform-graph1-to-graph2]  \
           |                           \
        [column-family-graph1][all-but-column-family-graph1]

        With this design, you can subclass the HeapIterator, deep
        copy the source in the init method, wrap one in a custom
        transform iterator, and create a appropriate seek method.
        This is probably more on the advanced side of Accumulo
        programming, but can be done.

        Adam


        On Tue, May 26, 2015 at 8:59 AM, Eric Newton
        <eric.new...@gmail.com <mailto:eric.new...@gmail.com>> wrote:

            Short answer: no.

            Long answer: maybe.

            You can write an iterator which will transform:

            row, cf1, cq, vis -> value

            into:

            row, cf2, cq, vis -> value

            And if you can do this while maintaining sort order, you
            can get your new ColumnFamily transformed during scans
            and compactions.

            But this bit about maintaining the sort order is more
            complex than it sounds.

            If you have the following:

            row, a, cq, vis -> value
            row, aa, cq, vis -> value


            And you want to transform cf "a" into cf "b":

            row, aa, cq, vis -> value
            row, b, cq, vis -> value


            Your iterator needs to hold the second column in memory,
            after transforming the first column.  Tablet server
            memory for holding Key/Values is not infinite.

            -Eric

            On Tue, May 26, 2015 at 8:44 AM, shweta.agrawal
            <shweta.agra...@orkash.com
            <mailto:shweta.agra...@orkash.com>> wrote:

                Hi,

                I want to ask, is it possible in accumulo to change
                the column family without changing the whole data.

                Suppose my column family is graph1, now i want to
                rename this column family as graph2.
                Is it possible?

                Thanks
                Shweta







    --
    *Andrew George Wells*
    *Software Engineer*
    *awe...@clearedgeit.com <mailto:awe...@clearedgeit.com>*




--
*Andrew George Wells*
*Software Engineer*
*awe...@clearedgeit.com <mailto:awe...@clearedgeit.com>*


Reply via email to