Gentle Reminder

On 21 August 2014 18:05, Sathyam <sathyam.dorasw...@gmail.com> wrote:

> Hi,
>
> I needed to generate tokens out of a URL such that I am able to get
> hierarchical units of the URL as well as each individual entity as tokens.
> For example:
> *Given a URL : *
>
> http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10&b=20&c=30#xyz
>
> The tokens that I need are :
>
> *Hierarchical subsets of the URL*
>
> 1 http://
>
> 2 http://www.google.com/
>
> 3 http://www.google.com/abcd/
>
>  4 http://www.google.com/abcd/efgh/
>
> 5 http://www.google.com/abcd/efgh/ijkl/
>
>  6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php
>
> *Individual elements in the path to the resource*
>
> 7 abcd
>
> 8 efgh
>
> 9 ijkl
>
> 10 mnop.php
>
> *Query Terms*
>
> 11 a=10
>
> 12 b=20
>
> 13 c=30
>
> *Fragment*
> 14 xyz
>
> This comes to a total of 14 tokens for the given URL.
> Basically a URL analyzer that creates tokens based on the categories
> mentioned in bold. Also a separate token for port(if mentioned).
>
> I would like to know how this can be achieved by using a single analyzer
> that uses a combination of the tokenizers and filters provided by solr.
> Also curious to know why there is a restriction of only *one  *tokenizer
> to be used in an analyzer.
> Looking forward to a response from your side telling the best possible way
> to achieve the closest to what I need.
>
> Thanks.
> --
> Sathyam Doraswamy
>
>
>
>


-- 
Sathyam Doraswamy

Reply via email to