Removing HTML markup is not a trivial task, but luckily, the Apache
   Solr team has already created additional analyzers for Lucene that do
   what I need (the analysis package in solr has a lot of really good
   stuff in it);



   I will still need some help from the Neo team to understand how use a
   specific analyzer instead of the default one...



   Thanks,



   Rick



   -------- Original Message --------
   Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring
   last word/term
   From: Morten Barklund <[1]mor...@barklund.dk>
   Date: Wed, September 15, 2010 12:29 pm
   To: Neo4j user discussions <[2]u...@lists.neo4j.org>
   Hi
   I might be overly simplistic here, but why not lowercase the text,
   remove
   html markup, then remove all non-word-or-space-characters, store this
   as the
   stripped version of the text on the node (for de-indexing) and index
   this?
   /Barklund
   On Wed, Sep 15, 2010 at 18:07,
   <[3]rick.bullo...@burningskysoftware.com> wrote:
   > Actually, it seems like a deeper bug/design flaw in Lucene's
   > analyzer/tokenizer. The actual text is HTML text, with <p> and </p>
   > wrappers. Lucene somewhat randomly seems to treat the last two words
   > as a single token, and in other cases ignore it altogether. The dot
   > character screws it up even more, because even if it tokenizes with
   the
   > dot character, you can't query with it (or at least nothing gets
   > returned).
   >
   >
   >
   > Hmmm. I really don't want to have to write a tokenizer/analyzer if I
   > can avoid it. Seems like a LOT of work.
   >
   >
   >
   > Do you have any example code of a custom tokenizer/analyzer we could
   > start from?
   >
   >
   >
   > Thanks,
   >
   >
   >
   > Rick
   >
   > -------- Original Message --------
   > Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring
   > last word/term
   > From: Mattias Persson <[1][4]matt...@neotechnology.com>
   > Date: Wed, September 15, 2010 11:47 am
   > To: Neo4j user discussions <[2][5]u...@lists.neo4j.org>
   > Couldn't it be that sentences ends with a dot... so "Cheese is good."
   > will
   > index the words: ["Cheese", "is", "good."] ? Observe the last word
   > isn't
   > "good", it's "good." with a dot. I know that has messed up some
   > searches for
   > me at least. You could perhaps override the implementation and
   > instantiate
   > an Analyzer/Tokenizer which gets rid of such punctuation characters?
   > 2010/9/15 <[3][6]rick.bullo...@burningskysoftware.com>
   > > Using neo4j-index-1.1 and lucene-core-2.9.2, by the way.
   > >
   > >
   > >
   > >
   > >
   > > -------- Original Message --------
   > > Subject: Re: [Neo4j] Bug: LuceneFullTextQueryIndex service ignoring
   > > last word/term
   > > From: Mattias Persson <[1][4][7]matt...@neotechnology.com>
   > > Date: Wed, September 15, 2010 10:37 am
   > > To: Neo4j user discussions <[2][5][8]u...@lists.neo4j.org>
   > > That sounds weird. Look at
   > > TestLuceneFulltextIndexService#testSimpleFulltext
   > > method, it queries for the last word and it seems to work.
   > > Could you provide more info on this?
   > > 2010/9/15 <[3][6][9]rick.bullo...@burningskysoftware.com>
   > > > I've noticed that when indexing full text, the last term/word is
   > > always
   > > > ignored. This is a major issue, but I'm not sure if it is in the
   > > index
   > > > utils or in Lucene itself.
   > > >
   > > >
   > > >
   > > > Any thoughts?
   > > >
   > > >
   > > >
   > > > Thanks,
   > > >
   > > >
   > > >
   > > > Rick
   > > > _______________________________________________
   > > > Neo4j mailing list
   > > > [4][7][10]u...@lists.neo4j.org
   > > > [5][8][11]https://lists.neo4j.org/mailman/listinfo/user
   > > >
   > > --
   > > Mattias Persson, [[6][9][12]matt...@neotechnology.com]
   > > Hacker, Neo Technology
   > > [7][10][13]www.neotechnology.com
   > > _______________________________________________
   > > Neo4j mailing list
   > > [8][11][14]u...@lists.neo4j.org
   > > [9][12][15]https://lists.neo4j.org/mailman/listinfo/user
   > >
   > > References
   > >
   > > 1. [13][16]mailto:matt...@neotechnology.com
   > > 2. [14][17]mailto:user@lists.neo4j.org
   > > 3. [15][18]mailto:rick.bullo...@burningskysoftware.com
   > > 4. [16][19]mailto:User@lists.neo4j.org
   > > 5. [17][20]https://lists.neo4j.org/mailman/listinfo/user
   > > 6. [18][21]mailto:matt...@neotechnology.com
   > > 7. [19][22]http://www.neotechnology.com/
   > > 8. [20][23]mailto:User@lists.neo4j.org
   > > 9. [21][24]https://lists.neo4j.org/mailman/listinfo/user
   > > _______________________________________________
   > > Neo4j mailing list
   > > [22][25]u...@lists.neo4j.org
   > > [23][26]https://lists.neo4j.org/mailman/listinfo/user
   > >
   > --
   > Mattias Persson, [[24][27]matt...@neotechnology.com]
   > Hacker, Neo Technology
   > [25][28]www.neotechnology.com
   > _______________________________________________
   > Neo4j mailing list
   > [26][29]u...@lists.neo4j.org
   > [27][30]https://lists.neo4j.org/mailman/listinfo/user
   >
   > References
   >
   > 1. [31]mailto:matt...@neotechnology.com
   > 2. [32]mailto:user@lists.neo4j.org
   > 3. [33]mailto:rick.bullo...@burningskysoftware.com
   > 4. [34]mailto:matt...@neotechnology.com
   > 5. [35]mailto:user@lists.neo4j.org
   > 6. [36]mailto:rick.bullo...@burningskysoftware.com
   > 7. [37]mailto:User@lists.neo4j.org
   > 8. [38]https://lists.neo4j.org/mailman/listinfo/user
   > 9. [39]mailto:matt...@neotechnology.com
   > 10. [40]http://www.neotechnology.com/
   > 11. [41]mailto:User@lists.neo4j.org
   > 12. [42]https://lists.neo4j.org/mailman/listinfo/user
   > 13. [43]mailto:matt...@neotechnology.com
   > 14. [44]mailto:user@lists.neo4j.org
   > 15. [45]mailto:rick.bullo...@burningskysoftware.com
   > 16. [46]mailto:User@lists.neo4j.org
   > 17. [47]https://lists.neo4j.org/mailman/listinfo/user
   > 18. [48]mailto:matt...@neotechnology.com
   > 19. [49]http://www.neotechnology.com/
   > 20. [50]mailto:User@lists.neo4j.org
   > 21. [51]https://lists.neo4j.org/mailman/listinfo/user
   > 22. [52]mailto:User@lists.neo4j.org
   > 23. [53]https://lists.neo4j.org/mailman/listinfo/user
   > 24. [54]mailto:matt...@neotechnology.com
   > 25. [55]http://www.neotechnology.com/
   > 26. [56]mailto:User@lists.neo4j.org
   > 27. [57]https://lists.neo4j.org/mailman/listinfo/user
   > _______________________________________________
   > Neo4j mailing list
   > [58]u...@lists.neo4j.org
   > [59]https://lists.neo4j.org/mailman/listinfo/user
   >
   --
   Morten Barklund
   _______________________________________________
   Neo4j mailing list
   [60]u...@lists.neo4j.org
   [61]https://lists.neo4j.org/mailman/listinfo/user

References

   1. mailto:mor...@barklund.dk
   2. mailto:user@lists.neo4j.org
   3. mailto:rick.bullo...@burningskysoftware.com
   4. mailto:matt...@neotechnology.com
   5. mailto:user@lists.neo4j.org
   6. mailto:rick.bullo...@burningskysoftware.com
   7. mailto:matt...@neotechnology.com
   8. mailto:user@lists.neo4j.org
   9. mailto:rick.bullo...@burningskysoftware.com
  10. mailto:User@lists.neo4j.org
  11. https://lists.neo4j.org/mailman/listinfo/user
  12. mailto:matt...@neotechnology.com
  13. http://www.neotechnology.com/
  14. mailto:User@lists.neo4j.org
  15. https://lists.neo4j.org/mailman/listinfo/user
  16. mailto:matt...@neotechnology.com
  17. mailto:user@lists.neo4j.org
  18. mailto:rick.bullo...@burningskysoftware.com
  19. mailto:User@lists.neo4j.org
  20. https://lists.neo4j.org/mailman/listinfo/user
  21. mailto:matt...@neotechnology.com
  22. http://www.neotechnology.com/
  23. mailto:User@lists.neo4j.org
  24. https://lists.neo4j.org/mailman/listinfo/user
  25. mailto:User@lists.neo4j.org
  26. https://lists.neo4j.org/mailman/listinfo/user
  27. mailto:matt...@neotechnology.com
  28. http://www.neotechnology.com/
  29. mailto:User@lists.neo4j.org
  30. https://lists.neo4j.org/mailman/listinfo/user
  31. mailto:matt...@neotechnology.com
  32. mailto:user@lists.neo4j.org
  33. mailto:rick.bullo...@burningskysoftware.com
  34. mailto:matt...@neotechnology.com
  35. mailto:user@lists.neo4j.org
  36. mailto:rick.bullo...@burningskysoftware.com
  37. mailto:User@lists.neo4j.org
  38. https://lists.neo4j.org/mailman/listinfo/user
  39. mailto:matt...@neotechnology.com
  40. http://www.neotechnology.com/
  41. mailto:User@lists.neo4j.org
  42. https://lists.neo4j.org/mailman/listinfo/user
  43. mailto:matt...@neotechnology.com
  44. mailto:user@lists.neo4j.org
  45. mailto:rick.bullo...@burningskysoftware.com
  46. mailto:User@lists.neo4j.org
  47. https://lists.neo4j.org/mailman/listinfo/user
  48. mailto:matt...@neotechnology.com
  49. http://www.neotechnology.com/
  50. mailto:User@lists.neo4j.org
  51. https://lists.neo4j.org/mailman/listinfo/user
  52. mailto:User@lists.neo4j.org
  53. https://lists.neo4j.org/mailman/listinfo/user
  54. mailto:matt...@neotechnology.com
  55. http://www.neotechnology.com/
  56. mailto:User@lists.neo4j.org
  57. https://lists.neo4j.org/mailman/listinfo/user
  58. mailto:User@lists.neo4j.org
  59. https://lists.neo4j.org/mailman/listinfo/user
  60. mailto:User@lists.neo4j.org
  61. https://lists.neo4j.org/mailman/listinfo/user
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to