Matt,

This should help:

Collection<Pair<Text,Text>> cols = Collections.singleton(new
Pair<Text,Text>(new Text("cityOfBirth"), null));
AccumuloInputFormat.fetchColumns(job, cols);



On Wed, Jan 15, 2014 at 7:29 PM, Dickson, Matt MR <
matt.dick...@defence.gov.au> wrote:

>  *UNOFFICIAL*
> Thanks Keith.  I've run a simple mr job based on the UniqueColumns
> example, but due to the size of the table this is taking a very long time.
> Is it possible to pre-filter the data that goes to the MR job based on
> family, eg only run the MR job on columns with a specific column family of
> 'cityofbirth'?  I am currently going through every column in the table and
> checking the column family in the mapper ... slow.
>
>
>
>  ------------------------------
> *From:* Keith Turner [mailto:ke...@deenlo.com]
> *Sent:* Wednesday, 15 January 2014 12:06
> *To:* user@accumulo.apache.org
>
> *Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL]
>
>
>
>
> On Tue, Jan 14, 2014 at 6:06 PM, Dickson, Matt MR <
> matt.dick...@defence.gov.au> wrote:
>
>>  *UNOFFICIAL*
>> Just for simplicity, this is a one of request for managment so I was
>> hoping to just scan via the shell and output to a file.
>>
>> If I need to do it via a mr job I can do it that way and would be keen to
>> hear any suggestions.
>>
>
> You could modify the following example in 1.4 to suit your needs.
>
>
> src/examples/simple/src/main/java/org/apache/accumulo/examples/simple/mapreduce/UniqueColumns.java
>
>
>>
>>  ------------------------------
>> *From:* David Medinets [mailto:david.medin...@gmail.com]
>> *Sent:* Wednesday, 15 January 2014 09:36
>> *To:* accumulo-user
>> *Subject:* Re: List of unique qualifiers [SEC=UNOFFICIAL]
>>
>>   Why the restriction to the shell environment? A nice map-reduce job
>> would be ideal for this task.
>>
>>
>> On Tue, Jan 14, 2014 at 5:30 PM, Dickson, Matt MR <
>> matt.dick...@defence.gov.au> wrote:
>>
>>>  *UNOFFICIAL*
>>> Hi,
>>>
>>> I need to extract a list of unique qualifier values on a table from the
>>> Accumulo shell.  For every column there is a column family that identifies
>>> a specific qualifer, eg 'cityofbirth'.  I would like to get a unique list
>>> of all cities that are a listed in the qualifier against 'cityofbirth' for
>>> all rows.
>>>
>>> eg, If I had a table with
>>>
>>> Rowid                Family            Qual
>>> 123                   cityofbirth         LosAngeles
>>> 133                   cityofbirth         Brisbane
>>> 222                   cityofbirth         London
>>> 124                   cityofbirth         London
>>> 124                   cityofbirth         London
>>>
>>> I want a list that is just;
>>> LosAngeles
>>> London
>>> Brisbane
>>>
>>> Any suggestions on how to achieve this from the shell would great.
>>>
>>> Thanks in advance.
>>> Matt
>>>
>>>
>>>
>>>
>>
>>
>

Reply via email to