from:"Magnus Johansson"

Re: Parsing .ppt

2004-11-15 Thread Magnus Johansson

There's some code using POI at http://www.mail-archive.com/poi-user@jakarta.apache.org/msg04809.html /magnus Luke Shannon wrote: Hey All; Anyone know a good API for parsing MS powerpoint files? Luke - To unsubscribe, e-mail: [E

Re: Power Point Processing

2004-09-24 Thread Magnus Johansson

I've had some success with the code found at http://www.mail-archive.com/[EMAIL PROTECTED]/msg04809.html together with POI. Then there's OpenOffice, but I don't really think it is usable in a production envrionment /Magnus Johansson > Hi, > > Does anyone know a go

Re: Analyzing and Querying

2004-08-06 Thread Magnus Johansson

us Daniel Naber wrote: On Friday 06 August 2004 13:28, Magnus Johansson wrote: Splitting compound words can be done quite effectively simply by using a large wordlist. I have done this for swedish. It is, however, difficult to get right for German. On the one hand there are compounds in G

Re: Analyzing and Querying

2004-08-06 Thread Magnus Johansson

You could create a custom analyzer that splits compound words into its parts. That is applying the analyzer to the word "bergbahn" would yield the terms "berg" and "bahn" Splitting compound words can be done quite effectively simply by using a large wordlist. I have done this for swedish. /magnus T

Re: Bridge with OpenOffice

2004-04-18 Thread Magnus Johansson

Yes I have tried it and it seems to work ok. I haven't really used it in a production environment however. There was some code here http://www.gzlinux.org/docs/category/dev/java/doc2txt.pdf it is however not there anymore, Google HTML version is however avaialble at http://66.102.9.104/search?q

Re: Google search algorithm

2004-01-29 Thread Magnus Johansson

be at a particular page after an infinite time using random browsing according to the probabilies found. This probability is then used as a basis for ranking results. Magnus Johansson > We all know Lucene algorithm (thanks to open source:). > Anybody has a general idea of how Google

Re: understanding IR topics on this list [was: Re: Vector Space Model in Lucene?]

2003-11-16 Thread Magnus Johansson

I would also like to recommend "Modern Information Retrieval" by Ricardo Baeza-Yates /magnus Gerret Apelt writes: Dror -- I just completed an introductory course in IR. I can recommend the textbook we used: "Managing Gigabytes: Compressing and Indexing Documents and Images". When I don't

Re: Similar Document Search

2003-08-19 Thread Magnus Johansson

unless you can keep the documents in memory somehow. Storing the other/non-inverted/normal/whatever index would be expensive for indexing, but querying should be a lot faster than having to re-index documents. That is in our situation preferable. Peter Magnus Johansson wrote: Hi Peter If t

Re: Similar Document Search

2003-08-19 Thread Magnus Johansson

Ok, here it is. It's part of a JSP that prints out all keywords in a document. /magnus <%@ page import="org.apache.lucene.index.IndexReader, org.apache.lucene.document.Document, com.technohuman.search.language.SwedishAnalyzer, java.io.StringReader,

Re: Similar Document Search

2003-08-19 Thread Magnus Johansson

Hi Peter If the original document is available. You could extract keywords from the document at query time. That is when someone asks for documents similar to document a. You re-analyze document a and in combination with statistics from the Lucene index you extract keywords from document a that

Re: QueryParser and compound words

2003-03-12 Thread Magnus Johansson

Tatu Saloranta wrote: On Wednesday 12 March 2003 01:19, Magnus Johansson wrote: Well, the problem arise when a user enters a query with a compound word and the compound word itself is not indexed, only one of its parts. Yes, but neither is compound word itself ever user in query either

Re: QueryParser and compound words

2003-03-12 Thread Magnus Johansson

e with you that this might not be a problem. The user could be instructed to reformulate his query. However the behaviour for an english index and a swedish index would be different. /magnus Tatu Saloranta wrote: On Tuesday 11 March 2003 03:05, Magnus Johansson wrote: Hello I have written

QueryParser and compound words

2003-03-11 Thread Magnus Johansson

Hello I have written an Analyzer for swedish. Compound words are common in swedish, therefore my Analyzer tries to split the compound words into its parts. For example the swedish word fotbollsmatch (football game) is split into fotboll and match. However when I use my Analyzer with the QueryPar

Re: Parsing .ppt

Re: Power Point Processing

Re: Analyzing and Querying

Re: Analyzing and Querying

Re: Bridge with OpenOffice

Re: Google search algorithm

Re: understanding IR topics on this list [was: Re: Vector Space Model in Lucene?]

Re: Similar Document Search

Re: Similar Document Search

Re: Similar Document Search

Re: QueryParser and compound words

Re: QueryParser and compound words

QueryParser and compound words

13 matches

Site Navigation

Mail list logo

Footer information