Re: Solr / Tika Integration

2012-02-10 Thread Shairon Toledo
the suggestions are often unusable and the search does not work as expected. Has anyone a suggestion how to extract the content of PDF containing sof-hyphens withpout fragmenting it? Best Dirk -- [ ]'s Shairon Toledo http://www.google.com/profiles/shairon.toledo

Phrase search issue with XMLPayload? Is it the better solution?

2010-01-04 Thread Shairon
I have a project that involves words extracted by OCR, each page has words, each word has its geometry to blink a highlight to end user. I've been trying represent this document structure by xml document page num=1 term top='111' bottom='222' right='333' left='444'foo/term term

Non-linear structure for search and index documents

2009-04-08 Thread Shairon
Hi all, I need index/search words extracted from pdf files with coordinates and page number, so I have this structure: - index the document id - a document has many pages - a page has many words - a word has geometry[w,h,x,y](inside of page) Is this possible with solr? If yes,