On Wed, May 9, 2012 at 11:00 AM, Billie J Rinaldi <billie.j.rina...@ugov.gov> wrote: > On Wednesday, May 9, 2012 10:31:46 AM, "Sean Pines" <spine...@gmail.com> > wrote: >> I have a use case that involves me removing a record from Accumulo >> based on the Row ID and the Column Family. >> >> In the shell, I noticed the command "deletemany" which allows you to >> specify column family/column qualifier. Is there an equivalent of this >> in the Java API? >> >> In the Java API, I noticed the method: >> deleteRows(String tableName, org.apache.hadoop.io.Text start, >> org.apache.hadoop.io.Text end) >> Delete rows between (start, end] >> >> However that only seems to work for deleting a range of RowIDs >> >> I would also imagine that deleting rows is costly; is there a better >> way to approach something like this? >> The workaround I have for now is to just overwrite the row with an >> empty string in the value field and ignore any entries that have that. >> However this just leaves lingering rows for each "delete" and I'd like >> to avoid that if at all possible. >> >> Thanks! > > Connector provides a createBatchDeleter method. You can set the range and > columns for BatchDeleter just like you would with a Scanner. This is not an > efficient operation (despite the current javadocs for BatchDeleter), but it > works well if you're deleting a small number of entries. It scans for the > affected key/value pairs, pulls them back to the client, then inserts > deletion entries for each. The deleteRows method, on the other hand, is > efficient because large ranges can just be dropped. If you want to delete a > lot of things and deleteRows won't work for you, consider using a majc scope > Filter that filters out what you don't want, compact the table, then remove > the filter.
If using the filter option probably would want to put filter at all scopes, flush, compact and then remove the filter. Having the filter at the scan scope prevents user from seeing any of the data immediately. If the filter is only at the majc scope, then users will see the data in some part of the table while the compaction is running. Having the filter at the minc scope will filter out any data in memory when you flush. Having the filter at the majc scope will filter existing data on disk when you compact. > > Billie