the suggestions are often unusable and the search does not
work as expected.
Has anyone a suggestion how to extract the content of PDF containing
sof-hyphens withpout fragmenting it?
Best
Dirk
--
[ ]'s
Shairon Toledo
http://www.google.com/profiles/shairon.toledo
I have a project that involves words extracted by OCR, each page has words,
each word has its geometry to blink a highlight to end user.
I've been trying represent this document structure by xml
document
page num=1
term top='111' bottom='222' right='333' left='444'foo/term
term
Hi all,
I need index/search words extracted from pdf files with coordinates and page
number, so I have this structure:
- index the document id
- a document has many pages
- a page has many words
- a word has geometry[w,h,x,y](inside of page)
Is this possible with solr?
If yes,