Gentle Reminder
On 21 August 2014 18:05, Sathyam <sathyam.dorasw...@gmail.com> wrote: > Hi, > > I needed to generate tokens out of a URL such that I am able to get > hierarchical units of the URL as well as each individual entity as tokens. > For example: > *Given a URL : * > > http://www.google.com/abcd/efgh/ijkl/mnop.php?a=10&b=20&c=30#xyz > > The tokens that I need are : > > *Hierarchical subsets of the URL* > > 1 http:// > > 2 http://www.google.com/ > > 3 http://www.google.com/abcd/ > > 4 http://www.google.com/abcd/efgh/ > > 5 http://www.google.com/abcd/efgh/ijkl/ > > 6 h ttp://www.google.com/abcd/efgh/ijkl/mnop.php > > *Individual elements in the path to the resource* > > 7 abcd > > 8 efgh > > 9 ijkl > > 10 mnop.php > > *Query Terms* > > 11 a=10 > > 12 b=20 > > 13 c=30 > > *Fragment* > 14 xyz > > This comes to a total of 14 tokens for the given URL. > Basically a URL analyzer that creates tokens based on the categories > mentioned in bold. Also a separate token for port(if mentioned). > > I would like to know how this can be achieved by using a single analyzer > that uses a combination of the tokenizers and filters provided by solr. > Also curious to know why there is a restriction of only *one *tokenizer > to be used in an analyzer. > Looking forward to a response from your side telling the best possible way > to achieve the closest to what I need. > > Thanks. > -- > Sathyam Doraswamy > > > > -- Sathyam Doraswamy