Tom: Can you pastebin the stack trace for the exception ? It would be nice if you can show snippet of your code too.
Thanks > On Jun 15, 2016, at 8:24 AM, Ellis, Tom (Financial Markets IT) > <tom.el...@lloydsbanking.com.INVALID> wrote: > > So I have a working prototype using just bulk puts on a table and using > setCellVisibility as necessary. Now I'm trying to do it using HFile. > > Sorry Ram, I don't quite follow why the user doing the writing of the HFile > has to be an admin/super user? Is that necessary to load HFiles? > > The use case is to hopefully have an application user (non admin) performing > the writes to an hbase table via a bulk load of an hfile, setting visibility > labels on individual cells as necessary. Then business users who has been > given the auth to view that label can see those cells, and others not. > > I've seen that it's possible to do this with map reduce & setting the map > output to be a Put (and thus could setCellVisibility on the puts), but I'm > struggling to do this with Spark, as I keep getting the exception that I > can't cast a Put to a Cell. > > Cheers, > > Tom Ellis > Consultant Developer – Excelian > Data Lake | Financial Markets IT > LLOYDS BANK COMMERCIAL BANKING > > > E: tom.el...@lloydsbanking.com > Website: www.lloydsbankcommercial.com > , , , > Reduce printing. Lloyds Banking Group is helping to build the low carbon > economy. > Corporate Responsibility Report: www.lloydsbankinggroup-cr.com/downloads > > > -----Original Message----- > From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] > Sent: 15 June 2016 12:31 > To: user@hbase.apache.org > Subject: Re: Writing visibility labels with HFileOutputFormat2 > > -- This email has reached the Bank via an external source -- > > >>> We could I guess create multiple puts for cells in the same row with > different labels and use the setCellVisibility on each individual put/cell, > but will this create additional overhead? > This can be done. If you want different cells in the same row to have > different labels then it is better to create those many puts and > setCellVisibility on each of them. What type of overhead you see here? In > terms of the server processing them? If so there should not be much overhead > here and also adding different cells to every column inturn means you need > every cell to be treated differenly in terms of security. so should be fine > IMHO. > > Without doing put.setCellvisibility() there is no other way I believe. One > question regarding your use case Now in the mail you had told about the spark > job where you will create a bulk loaded file. Now if that is to have all the > visibility related information of all the cells then the user doing this job > should be an admin or super user right Why is the case that a normal client > user will read through all the visibility cells which may or may not be > associated with that user? > > Thank you very much for testing and using this feature. LEt us know your > feedback and if you find any gaps here. Happy to help. > > Regards > Ram > > >> On Wed, Jun 15, 2016 at 4:09 PM, Ellis, Tom (Financial Markets IT) < >> tom.el...@lloydsbanking.com.invalid> wrote: >> >> Hmm, is there no other way to set labels on individual cells where we >> don't have to give the client users system perms? For instance, client >> users can set the cell visibility on the entire put without having >> this (i.e. put.setCellVisibility("label")) and the >> VisibilityController will check this. >> >> We could I guess create multiple puts for cells in the same row with >> different labels and use the setCellVisibility on each individual >> put/cell, but will this create additional overhead? >> >> Cheers, >> >> Tom Ellis >> Consultant Developer – Excelian >> Data Lake | Financial Markets IT >> LLOYDS BANK COMMERCIAL BANKING >> >> >> E: tom.el...@lloydsbanking.com >> Website: www.lloydsbankcommercial.com >> , , , >> Reduce printing. Lloyds Banking Group is helping to build the low >> carbon economy. >> Corporate Responsibility Report: >> www.lloydsbankinggroup-cr.com/downloads >> >> >> -----Original Message----- >> From: ramkrishna vasudevan [mailto:ramkrishna.s.vasude...@gmail.com] >> Sent: 15 June 2016 11:24 >> To: user@hbase.apache.org >> Subject: Re: Writing visibility labels with HFileOutputFormat2 >> >> -- This email has reached the Bank via an external source -- >> >> >> The visibility expression resolver tries to scan the labels table and >> the user using the resolver should have the SYSTEM privileges. Since >> the information that is getting accessed is sensitive information. >> >> Suppose in your above case you have the client user added as a an >> admin then when you scan the label table you should be able to scan it. >> >> Regards >> Ram >> >> On Wed, Jun 15, 2016 at 3:09 PM, Ellis, Tom (Financial Markets IT) < >> tom.el...@lloydsbanking.com.invalid> wrote: >> >>> Yeah, thanks for this Ram. Although in my testing I have found that >>> a client user attempting to use the visibility expression resolver >>> doesn't seem to have the ability to scan the hbase:labels table for >>> the full list of labels and thus can't get the ordinals/tags to add >>> to the cell. Does the client user attempting to use the >>> VisibilityExpressionResolver have to have some special permissions? >>> >>> Scan of hbase:labels by client user: >>> >>> hbase(main):003:0> scan 'hbase:labels' >>> ROW COLUMN+CELL >>> \x00\x00\x00\x01 column=f:\x00, >>> timestamp=1465216652662, value=system >>> 1 row(s) in 0.0650 seconds >>> >>> Scan of hbase:labels by hbase user: >>> >>> hbase(main):001:0> scan 'hbase:labels' >>> ROW COLUMN+CELL >>> \x00\x00\x00\x01 column=f:\x00, >>> timestamp=1465216652662, value=system >>> \x00\x00\x00\x02 column=f:\x00, >>> timestamp=1465216944935, value=protected >>> \x00\x00\x00\x02 column=f:hbase, >>> timestamp=1465547138533, value= >>> \x00\x00\x00\x02 column=f:tom, >>> timestamp=1465980236882, value= >>> \x00\x00\x00\x03 column=f:\x00, >>> timestamp=1465500156667, value=testtesttest >>> \x00\x00\x00\x03 column=f:@hadoop, >>> timestamp=1465980236967, value= >>> \x00\x00\x00\x03 column=f:hadoop, >>> timestamp=1465547304610, value= >>> \x00\x00\x00\x03 column=f:hive, >>> timestamp=1465501322616, value= >>> \x00\x00\x00\x04 column=f:\x00, >>> timestamp=1465570719901, value=confidential >>> \x00\x00\x00\x05 column=f:\x00, >>> timestamp=1465835047835, value=branch >>> \x00\x00\x00\x05 column=f:hdfs, >>> timestamp=1465980237060, value= >>> \x00\x00\x00\x06 column=f:\x00, >>> timestamp=1465980447307, value=group >>> \x00\x00\x00\x06 column=f:hdfs, >>> timestamp=1465980454130, value= >>> 6 row(s) in 0.7370 seconds >>> >>> Cheers, >>> >>> Tom Ellis >>> Consultant Developer – Excelian >>> Data Lake | Financial Markets IT >>> LLOYDS BANK COMMERCIAL BANKING >>> >>> >>> E: tom.el...@lloydsbanking.com >>> Website: www.lloydsbankcommercial.com , , , Reduce printing. Lloyds >>> Banking Group is helping to build the low carbon economy. >>> Corporate Responsibility Report: >>> www.lloydsbankinggroup-cr.com/downloads >>> >>> -----Original Message----- >>> From: Anoop John [mailto:anoop.hb...@gmail.com] >>> Sent: 08 June 2016 11:58 >>> To: user@hbase.apache.org >>> Subject: Re: Writing visibility labels with HFileOutputFormat2 >>> >>> -- This email has reached the Bank via an external source -- >>> >>> >>> Thanks Ram.. Ya that seems the best way as CellCreator is public >>> exposed class. May be we should explain abt this in hbase book under >>> the Visibility labels area. Good to know you have Visibility labels >>> based usecase. Let us know in case of any trouble. Thanks. >>> >>> -Anoop- >>> >>> On Wed, Jun 8, 2016 at 1:43 PM, ramkrishna vasudevan < >>> ramkrishna.s.vasude...@gmail.com> wrote: >>>> Hi >>>> >>>> It can be done. See the class CellCreator which is Public facing >>> interface. >>>> When you create your spark job to create the hadoop files that >>>> produces the >>>> HFileOutputformat2 data. While creating the KeyValues you can use >>>> the CellCreator to create your KeyValues and use the >>>> CellCreator.getVisibilityExpressionResolver() to map your String >>>> Visibility tags with the system generated ordinals. >>>> >>>> For eg, you can see how TextSortReducer works. I think this >>>> should help you solve your problem. Let us know if you need >>>> further >> information. >>>> >>>> Regards >>>> Ram >>>> >>>> On Tue, Jun 7, 2016 at 3:58 PM, Ellis, Tom (Financial Markets IT) >>>> < tom.el...@lloydsbanking.com.invalid> wrote: >>>> >>>>> Hi Ram, >>>>> >>>>> We're attempting to do it programmatically so: >>>>> >>>>> The HFile is created by a Spark job using saveAsNewAPIHadoopFile, >>>>> and using ImmutableBytesWritable as the key (rowkey) with >>>>> KeyValue as the value, and using the HFilOutputFormat2 format. >>>>> This HFile is then loaded using HBase client's >>>>> LoadIncrementalHFiles.doBulkLoad >>>>> >>>>> Is there a way to do this programmatically without using the >>>>> ImportTsv tool? I was taking a look at >>>>> VisibilityUtils.createVisibilityExpTags and maybe being able to >>>>> just create the Tags myself that way (although it's obviously >>>>> @InterfaceAudience.Private) but it seems to be able to use that >>>>> I'd >>> need to know Label ordinality client side.. >>>>> >>>>> Thanks for your help, >>>>> >>>>> Tom >>>>> >>>>> -----Original Message----- >>>>> From: ramkrishna vasudevan >>>>> [mailto:ramkrishna.s.vasude...@gmail.com] >>>>> Sent: 07 June 2016 11:19 >>>>> To: user@hbase.apache.org >>>>> Subject: Re: Writing visibility labels with HFileOutputFormat2 >>>>> >>>>> -- This email has reached the Bank via an external source -- >>>>> >>>>> >>>>> Hi Ellis >>>>> >>>>> How is the HFileOutputFormat2 files created? Are you using the >>>>> ImportTsv tool? If you are using the ImportTsv tool then yes >>>>> there is a way to specify visibility tags while loading from the >>>>> ImportTsv tool and those visibility tags are also bulk loaded as >> HFile. >>>>> >>>>> There is an attribute CELL_VISIBILITY_COLUMN_SPEC that can be >>>>> used to indicate that the data will have Visibility Tags and the >>>>> tool will automatically parse the specified field as Visibility Tag. >>>>> >>>>> In case you have access to the code you can see the test case >>>>> TestImportTSVWithVisibilityLabels to get an initial idea of how >>>>> it is being done. If not get back to us, happy to help . >>>>> >>>>> Regards >>>>> Ram >>>>> >>>>> >>>>> >>>>> On Tue, Jun 7, 2016 at 3:36 PM, Ellis, Tom (Financial Markets IT) >>>>> < tom.el...@lloydsbanking.com.invalid> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I was wondering if it's possible/how to write Visibility Labels >>>>>> to an HFileOutputFormat2? I believe Visibility Labels are just >>>>>> implemented as Tags, but with the normal way of writing them >>>>>> with Mutation#setCellVisibility these are formally written as >>>>>> Tags to the cells during the VisibilityController coprocessor >>>>>> as we need to assert the expression is valid for the labels configured. >>>>>> >>>>>> How can we add visibility labels to cells if we have a job that >>>>>> creates an HFile with HFileOutputFormat2 which is then >>>>>> subsequently loaded using LoadIncrementalHFiles? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Tom Ellis >>>>>> Consultant Developer - Excelian Data Lake | Financial Markets >>>>>> IT LLOYDS BANK COMMERCIAL BANKING >>>>>> ________________________________ >>>>>> >>>>>> E: >>>>>> tom.el...@lloydsbanking.com<mailto:tom.el...@lloydsbanking.com> >>>>>> Website: >>>>>> www.lloydsbankcommercial.com<http://www.lloydsbankcommercial.co >>>>>> m/ >>>>>> , , , >>>>>> Reduce printing. Lloyds Banking Group is helping to build the >>>>>> low carbon economy. >>>>>> Corporate Responsibility Report: >>>>>> www.lloydsbankinggroup-cr.com/downloads< >>>>>> http://www.lloydsbankinggroup-cr.com/downloads> >>>>>> >>>>>> >>>>>> >>>>>> Lloyds Banking Group plc. Registered Office: The Mound, >>>>>> Edinburgh >>>>>> EH1 >>>>> 1YZ. >>>>>> Registered in Scotland no. SC95000. Telephone: 0131 225 4555. >>>>>> Lloyds Bank plc. Registered Office: 25 Gresham Street, London >>>>>> EC2V >>> 7HN. >>>>>> Registered in England and Wales no. 2065. Telephone 0207626 1500. >>>>>> Bank >>>>> of Scotland plc. >>>>>> Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in >>>>>> Scotland >>>>> no. >>>>>> SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. >>>>>> Registered >>>>>> Office: Barnett Way, Gloucester GL4 3RL. Registered in England >>>>>> and Wales 2299428. Telephone: 0345 603 1637 >>>>>> >>>>>> Lloyds Bank plc, Bank of Scotland plc are authorised by the >>>>>> Prudential Regulation Authority and regulated by the Financial >>>>>> Conduct Authority and Prudential Regulation Authority. >>>>>> >>>>>> Cheltenham & Gloucester plc is authorised and regulated by the >>>>>> Financial Conduct Authority. >>>>>> >>>>>> Halifax is a division of Bank of Scotland plc. Cheltenham & >>>>>> Gloucester Savings is a division of Lloyds Bank plc. >>>>>> >>>>>> HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. >>>>>> Registered in Scotland no. SC218813. >>>>>> >>>>>> This e-mail (including any attachments) is private and >>>>>> confidential and may contain privileged material. If you have >>>>>> received this e-mail in error, please notify the sender and >>>>>> delete it (including any >>>>>> attachments) immediately. You must not copy, distribute, >>>>>> disclose or use any of the information in it or any >>>>>> attachments. Telephone calls may be monitored or recorded. >>>>> >>>>> >>>>> Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh >>>>> EH1 >>> 1YZ. >>>>> Registered in Scotland no. SC95000. Telephone: 0131 225 4555. >>>>> Lloyds Bank plc. Registered Office: 25 Gresham Street, London >>>>> EC2V >> 7HN. >>>>> Registered in England and Wales no. 2065. Telephone 0207626 1500. >>>>> Bank >>> of Scotland plc. >>>>> Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in >>>>> Scotland >>> no. >>>>> SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. >>>>> Registered >>>>> Office: Barnett Way, Gloucester GL4 3RL. Registered in England >>>>> and Wales 2299428. Telephone: 0345 603 1637 >>>>> >>>>> Lloyds Bank plc, Bank of Scotland plc are authorised by the >>>>> Prudential Regulation Authority and regulated by the Financial >>>>> Conduct Authority and Prudential Regulation Authority. >>>>> >>>>> Cheltenham & Gloucester plc is authorised and regulated by the >>>>> Financial Conduct Authority. >>>>> >>>>> Halifax is a division of Bank of Scotland plc. Cheltenham & >>>>> Gloucester Savings is a division of Lloyds Bank plc. >>>>> >>>>> HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. >>>>> Registered in Scotland no. SC218813. >>>>> >>>>> This e-mail (including any attachments) is private and >>>>> confidential and may contain privileged material. If you have >>>>> received this e-mail in error, please notify the sender and >>>>> delete it (including any >>>>> attachments) immediately. You must not copy, distribute, disclose >>>>> or use any of the information in it or any attachments. Telephone >>>>> calls may be monitored or recorded. >>> >>> >>> Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh >>> EH1 >> 1YZ. >>> Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds >>> Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. >>> Registered in England and Wales no. 2065. Telephone 0207626 1500. >>> Bank >> of Scotland plc. >>> Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in >>> Scotland >> no. >>> SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. >>> Registered >>> Office: Barnett Way, Gloucester GL4 3RL. Registered in England and >>> Wales 2299428. Telephone: 0345 603 1637 >>> >>> Lloyds Bank plc, Bank of Scotland plc are authorised by the >>> Prudential Regulation Authority and regulated by the Financial >>> Conduct Authority and Prudential Regulation Authority. >>> >>> Cheltenham & Gloucester plc is authorised and regulated by the >>> Financial Conduct Authority. >>> >>> Halifax is a division of Bank of Scotland plc. Cheltenham & >>> Gloucester Savings is a division of Lloyds Bank plc. >>> >>> HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. >>> Registered in Scotland no. SC218813. >>> >>> This e-mail (including any attachments) is private and confidential >>> and may contain privileged material. If you have received this >>> e-mail in error, please notify the sender and delete it (including >>> any >>> attachments) immediately. You must not copy, distribute, disclose or >>> use any of the information in it or any attachments. Telephone calls >>> may be monitored or recorded. >> >> >> Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. >> Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds >> Bank plc. Registered Office: 25 Gresham Street, London EC2V 7HN. >> Registered in England and Wales no. 2065. Telephone 0207626 1500. Bank of >> Scotland plc. >> Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. >> SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. >> Registered >> Office: Barnett Way, Gloucester GL4 3RL. Registered in England and >> Wales 2299428. Telephone: 0345 603 1637 >> >> Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential >> Regulation Authority and regulated by the Financial Conduct Authority >> and Prudential Regulation Authority. >> >> Cheltenham & Gloucester plc is authorised and regulated by the >> Financial Conduct Authority. >> >> Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester >> Savings is a division of Lloyds Bank plc. >> >> HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered >> in Scotland no. SC218813. >> >> This e-mail (including any attachments) is private and confidential >> and may contain privileged material. If you have received this e-mail >> in error, please notify the sender and delete it (including any >> attachments) immediately. You must not copy, distribute, disclose or >> use any of the information in it or any attachments. Telephone calls >> may be monitored or recorded. > > > Lloyds Banking Group plc. Registered Office: The Mound, Edinburgh EH1 1YZ. > Registered in Scotland no. SC95000. Telephone: 0131 225 4555. Lloyds Bank > plc. Registered Office: 25 Gresham Street, London EC2V 7HN. Registered in > England and Wales no. 2065. Telephone 0207626 1500. Bank of Scotland plc. > Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in Scotland no. > SC327000. Telephone: 03457 801 801. Cheltenham & Gloucester plc. Registered > Office: Barnett Way, Gloucester GL4 3RL. Registered in England and Wales > 2299428. Telephone: 0345 603 1637 > > Lloyds Bank plc, Bank of Scotland plc are authorised by the Prudential > Regulation Authority and regulated by the Financial Conduct Authority and > Prudential Regulation Authority. > > Cheltenham & Gloucester plc is authorised and regulated by the Financial > Conduct Authority. > > Halifax is a division of Bank of Scotland plc. Cheltenham & Gloucester > Savings is a division of Lloyds Bank plc. > > HBOS plc. Registered Office: The Mound, Edinburgh EH1 1YZ. Registered in > Scotland no. SC218813. > > This e-mail (including any attachments) is private and confidential and may > contain privileged material. If you have received this e-mail in error, > please notify the sender and delete it (including any attachments) > immediately. You must not copy, distribute, disclose or use any of the > information in it or any attachments. Telephone calls may be monitored or > recorded.