Re: Beginner Question about Types and Parent-Child Definitions in Solr Schema.xml

2013-02-03 Thread Mattmann, Chris A (388J)
Hi Amir, One thing you may want to look at is the Apache OODT File Manager here. If you can model your content objects as Apache OODT "product types" you can then use the parent/child relationship model there to achieve what you are after. Then if you use the SolrDumperTool from the Apache OODT fi

Re: Which token filter can combine 2 terms into 1?

2012-12-27 Thread Mattmann, Chris A (388J)
Hi Guys, I also worked on a CombiningTokenFilter, see: https://issues.apache.org/jira/browse/LUCENE-3413 Patch has been up and available for a while. HTH! Cheers, Chris On 12/27/12 12:26 AM, "Dmitry Kan" wrote: >Hi, > >Have a look onto TokenFilter. Extending it will give you access to a >

Re: Too many Tika errors

2012-12-11 Thread Mattmann, Chris A (388J)
Hi there -- you may want to post this to the d...@tika.apache.org list. Cheers, Chris On 12/11/12 11:08 AM, "eShard" wrote: >I'm running Solr 4.0 on Tomcat 7.0.8 and I'm running the solr/example >single >core as well with manifoldcf v1.1 >I had everything working but then the crawler stops and

Re: Tika error

2012-12-07 Thread Mattmann, Chris A (388J)
Hi Arkadi, You may want to post this on the u...@tika.apache.org list -- looks like you are missing the univerisalchardetector library as part of your Solr Cell installation. Cheers, Chris On 12/6/12 12:02 AM, "Arkadi Colson" wrote: >Anybody an idea? > >Dec 5, 2012 3:52:32 PM org.apache.solr.c

Re: Geocoding with Solr

2012-07-29 Thread Mattmann, Chris A (388J)
Hi there, You may want to check out: SOLR-2073 Geonames.org UpdateProcessor for Spatial SOLR-2074 GeoRSS ResponseWriter SOLR-2075 SpatialQParserPlugin and HostIP adaptor SOLR-2076 Spatial example schema updates SOLR-2077 Spatial example solconfig updates SOLR-2079 Expose HttpServletRequest object

Re: Sorting by article title

2011-10-05 Thread Mattmann, Chris A (388J)
Hi, You can also check out LUCENE-3413 [1] and the CombiningFilter that I wrote and associated example. This lets you: 1. perform normal tokenization and analysis in your analysis chain 2. recombine the tokens at the end for sorting purposes HTH, Chris [1] https://issues.apache.org/jira/browse

Re: Analyzers and sorting with a custom analysis chain

2011-09-02 Thread Mattmann, Chris A (388J)
On Sep 2, 2011, at 8:53 PM, Mattmann, Chris A (388J) wrote: > > I think in spelling this out though, I might have elaborated my problem. > Since > the method I call in the constructor for my CombiningFilter is > super(mergeStreamTokens(in)) > where mergeStreamTokens is

Re: Analyzers and sorting with a custom analysis chain

2011-09-02 Thread Mattmann, Chris A (388J)
Hi Yonik, On Sep 2, 2011, at 7:47 PM, Yonik Seeley wrote: > On Fri, Sep 2, 2011 at 10:26 PM, Mattmann, Chris A (388J) > wrote: >> I'm left with childrenshospitallosangeles as a single token resultant from >> the chain. >> So, when I go to sort the titles in Solr, I

Analyzers and sorting with a custom analysis chain

2011-09-02 Thread Mattmann, Chris A (388J)
Hi Everyone, I've got an Analysis question related to both Lucene and Solr (sorry for the cross posting). i've created a custom analysis chain part of a field type for the title field in my schema representing Businesses. I've created an addition field called title_sort where I copied the orig

Re: Tika Jax-RS and DIH

2011-06-22 Thread Mattmann, Chris A (388J)
Hi Tod, On Jun 22, 2011, at 6:00 AM, Tod wrote: >> Mattmann, Chris A (388J jpl.nasa.gov> writes: >> >>>> >>>> Hi Jo, >>>> >>>> You may consider checking out Tika trunk, where we recently have a Tika >>>> JAX-RS >

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-22 Thread Mattmann, Chris A (388J)
Glad it worked out! Cheers, Chris On Jun 22, 2011, at 5:14 AM, Surendra wrote: > Hi Chris ,Andreas > > I have upgraded to solr 3.2 ... everything seems fine now. I will have to > integrate this to my application and observe if any further issues...again > thanks for your patience and time... >

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-21 Thread Mattmann, Chris A (388J)
Hi Surendra, Thanks. Besides replacing the tika-*-0.9.jar files, you also need to replace the dependency jar files for the other libs as well since they have been upgraded. It's also possible that b/c of API changes, Solr 1.4.1 won't work with Tika 0.9 without modifying the ExtractingRequestHan

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-20 Thread Mattmann, Chris A (388J)
Hi Surendra, On Jun 20, 2011, at 4:59 AM, Surendra wrote: > Hey Chris > > I have added tika-core 0.9 and tika-parsers 0.9 to Solr1.4.1 (extraction/lib) > after building them using the source provided by TIKA. Now I have an issue > with > this. I am working with extracting PDF content using Solr

Re: GeoJSON Response Writer

2011-05-29 Thread Mattmann, Chris A (388J)
Hey Adam, I haven't done GeoJSON, but I did whip up a GeoRSS one, check it out here: https://issues.apache.org/jira/browse/SOLR-2074 Cheers, Chris On May 29, 2011, at 11:14 AM, Adam Estrada wrote: > All, > > Has anyone modified the current json response writer to include the GeoJSON > geospat

Re: [WKT] Spatial Searching

2011-03-29 Thread Mattmann, Chris A (388J)
ertise p a group of like minded individuals. Cheers, Chris > > ~ David > ________ > From: Mattmann, Chris A (388J) [chris.a.mattm...@jpl.nasa.gov] > Sent: Tuesday, March 29, 2011 1:00 AM > To: solr-user@lucene.apache.org > Cc: Adam Estrada &

Re: [WKT] Spatial Searching

2011-03-28 Thread Mattmann, Chris A (388J)
LGPL licenses and Apache aren't exactly compatible, see: http://www.apache.org/legal/3party.html#transition-examples-lgpl http://www.apache.org/legal/resolved.html#category-x In practice, this was the reason we started the SIS project. Cheers, Chris On Mar 28, 2011, at 11:16 AM, Smiley, David W

Re: Tika metadata extracted per supported document format?

2011-02-25 Thread Mattmann, Chris A (388J)
Hi Andreas, > java -jar tika-app-0.9.jar --list-met-models > TikaMetadataKeys > PROTECTED > RESOURCE_NAME_KEY > TikaMimeKeys > MIME_TYPE_MAGIC > TIKA_MIME_FILE > > Both 0.8 and 0.9 give me the same list. Is that a configuration issue? Strange -- those are the only met models you're seeing liste

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Mattmann, Chris A (388J)
Yep it's fixed in 0.9. Cheers, Chris On Feb 25, 2011, at 2:37 PM, Andreas Kemkes wrote: > According to the Tika release notes, it's fixed in 0.9. Haven't tried it > myself. > > A critical backwards incompatible bug in PDF parsing that was introduced in > Tika > 0.8 has been fixed. (TIKA-548

Re: Tika metadata extracted per supported document format?

2011-02-25 Thread Mattmann, Chris A (388J)
Hi Andreas, In Tika 0.8+, you can run the --list-met-models command from tika-app: java -jar tika-app-.jar --list-met-models And get a print out of the met keys that Tika supports. Some parsers add their own that aren't part of this met listing, but this is a relatively comprehensive list. Ch

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-02-25 Thread Mattmann, Chris A (388J)
Hi Jo, You may consider checking out Tika trunk, where we recently have a Tika JAX-RS web service [1] committed as part of the tika-server module. You could probably wire DIH into it and accomplish the same thing. Cheers, Chris [1] https://issues.apache.org/jira/browse/TIKA-593 On Feb 24, 201

Re: [WKT] Spatial Searching

2011-02-08 Thread Mattmann, Chris A (388J)
+1 to David's patch from SOLR-2155. It would be great to implement. Great job using GDAL on converting the WKT Adam! Cheers, Chris On Feb 8, 2011, at 8:18 PM, Adam Estrada wrote: > I just came across a ~nudge post over in the SIS list on what the status is > for that project. This got me look

Re: What is the best protocol for data transfer rate HTTP or RMI?

2011-02-04 Thread Mattmann, Chris A (388J)
Hi Guys, It depends on what properties you're trying to maximize. I've done several studies of this over the years: http://sunset.usc.edu/~mattmann/pubs/MSST2006.pdf http://sunset.usc.edu/~mattmann/pubs/IWICSS07.pdf http://sunset.usc.edu/~mattmann/pubs/icse-shark08.pdf And if you're really bore

Re: Indexing FTP Documents through SOLR??

2011-01-23 Thread Mattmann, Chris A (388J)
I'd be happy to comment: A simple shell script doesn't provide URL filtering and control of how you crawl those documents on the local file system. Nutch has several levels of URL filtering based on regex, MIME type, and others. Also, if there are any outlinks in those local files that point to

[Call for Papers] ICSE Software Engineering for Cloud Computing (SECLOUD) Workshop

2011-01-20 Thread Mattmann, Chris A (388J)
(apologies for the cross posting) *** PLEASE NOTE - the deadline for submitting papers has been extended by 1 week to 1/28/2011! *** Please consider submitting a paper to the ICSE 2011 Software Engineering for Cloud Computing (SECLOUD) Workshop to be held Sunday, May 22, 2011, at the Hilton Ha

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Mattmann, Chris A (388J)
On Jan 18, 2011, at 2:24 PM, Glen Newton wrote: > Where do you get your Lucene/Solr downloads from? > > [] ASF Mirrors (linked in our release announcements or via the Lucene website) > > [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) > > [] I/we build them from source via a

Re: [Solr4.0] Release Date

2011-01-06 Thread Mattmann, Chris A (388J)
Hey Adam, The Solr/Lucene people decided that with versions post the Solr/Lucene merge that they would sync the Solr version with the Lucene version (meaning there is no 1.5 release or plans for it, which IMHO as I noted before is confusing). Here's a pointer to a former discussion thread about

[Call for Papers] ICSE Software Engineering for Cloud Computing (SECLOUD) Workshop

2011-01-03 Thread Mattmann, Chris A (388J)
(apologies for the cross posting) Please consider submitting a paper to the ICSE 2011 Software Engineering for Cloud Computing (SECLOUD) Workshop to be held Sunday, May 22, 2011, at the Hilton Hawaiian Village Resort in Waikiki, Honolulu, HI. This workshop focuses on identifying the grand chall

Re: SpatialTierQueryParserPlugin Loading Error

2010-12-28 Thread Mattmann, Chris A (388J)
Hi Adam, I cut a branch at Github of a forked Solr 1.5 that applied a bunch of patches that my student and I did in my CSCI 572 class at USC. The branch is here [1]. You can simply use Git to go grab that if you want and not worry about the patches either. I sent an email [2] and filed a bunch

FW: [Spatial] Geonames and extension to Spatial Solution for Solr

2010-08-24 Thread Mattmann, Chris A (388J)
Oops, forgot to include solr-user@ in the original email. FYI below... -- Forwarded Message From: "Mattmann, Chris A (388J)" Reply-To: Date: Tue, 24 Aug 2010 07:02:58 -0700 To: Subject: [Spatial] Geonames and extension to Spatial Solution for Solr Hi Folks, You may have notice

Re: How to do Spatial Search with Solr?

2010-08-22 Thread Mattmann, Chris A (388J)
Hi Savannah, Check out the patches I just threw up for SOLR-2073, SOLR-2074, SOLR-2075, SOLR-2076 and SOLR-2077. There's code in there to deal with Geonames.org data. There's more patches coming so hopefully it will be clearer as I add them. Thanks to W. Quach for leading the charge on these p

Re: Fun with Spatial (Haversine formula)

2010-08-20 Thread Mattmann, Chris A (388J)
It might have something to do with the source data and its spatial reference system. For example, if the data is in WGS84 then the haversine (great circle) distance precision gets worse the farther away two cities are from each other or for particular regions (e.g. further away from equator). Chee

Re: send to list

2010-07-16 Thread Mattmann, Chris A (388J)
Hi Joe, Take a look at the Cartesian Grid work from Patrick O'Leary here [1]. It's not fully integrated with Solr and they are moving away from it, but it'll give you a good idea of how to get started and to go about doing this... HTH, Chris [1] http://www.nsshutdown.com/projects/lucene/whitep

FW: Tika in Action

2010-06-14 Thread Mattmann, Chris A (388J)
All, FYI, as SolrCell is built on top of Tika, some folks might be interested in this message I posted to the Tika lists. Thanks! Cheers, Chris -- Forwarded Message From: "Mattmann, Chris A (388J)" Reply-To: Date: Fri, 11 Jun 2010 19:07:24 -0700 To: Cc: Subject: Tika in

Re: schema.xml XSD/DTD

2010-05-05 Thread Mattmann, Chris A (388J)
Hey Guys, There is some work on SOLR-17 to track this. I put up a patch that's incomplete, based on the prior work done by Mike Baranczak and Hoss and others. I've been meaning to get back to it, but have been swamped. Contributions/updates welcome! Cheers, Chris [1] http://issues.apache.org/

Re: XSD for Solrv1.4

2010-04-15 Thread Mattmann, Chris A (388J)
Hi Stefan, I'm not sure about releasing one for 1.4, but I was making some progress on SOLR-17 [1] (and see the linked issues from there) on pushing this forward. It dropped off my radar for a while, but I'd be happy to pick it up back up. Let me see what I can do. Thanks! Cheers, Chris [1]

Some help for folks trying to get new Solr/Lucene up in Eclipse

2010-04-05 Thread Mattmann, Chris A (388J)
Hey All, Just to save some folks some time in case you are trying to get new Lucene/Solr up in running in Eclipse. If you continue to get weird errors, e.g., in solr/src/test/TestConfig.java regarding org.w3c.dom.Node#getTextContent(), I found for me this error was caused by including the Tidy.jar

Re: wikipedia and teaching kids search engines

2010-03-24 Thread Mattmann, Chris A (388J)
Hey Erik, One thing to think about (and I'm no expert at middle school kids) would be to relate search somehow to a topic they are interested in. My 12 year old nephew loves the NBA, so if I were to talk to him about search, I would try and relate it to e.g., NBA.com, or understanding the differen

Re: PDFBox/Tika Performance Issues

2010-03-23 Thread Mattmann, Chris A (388J)
Hi Giovanni, The error that you're showing in your logs below indicates that this message signature: org.apache.solr.handler.ContentStreamLoader.load(Lorg/apache/solr/request/SolrQueryRequest;Lorg/apache/solr/response/SolrQueryResponse;Lorg/apache/solr/common/util/ContentStream;) doesn't match

Re: PDFBox/Tika Performance Issues

2010-03-19 Thread Mattmann, Chris A (388J)
that works... -Original Message----- From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, March 19, 2010 12:04 AM To: solr-user@lucene.apache.org Subject: Re: PDFBox/Tika Performance Issues Hi Giovanni, Let's try and isolate the problem. Can you try parsing th

Re: PDFBox/Tika Performance Issues

2010-03-18 Thread Mattmann, Chris A (388J)
3.6.jar poi-ooxml-3.6.jar poi-ooxml-schemas-3.6.jar poi-scratchpad-3.6.jar tagsoup-1.2.jar tika-core-0.7-SNAPSHOT.jar tika-parsers-0.7-SNAPSHOT.jar xercesImpl-2.8.1.jar xml-apis-1.0.b2.jar xmlbeans-2.3.0.jar -Original Message- From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.go

Re: PDFBox/Tika Performance Issues

2010-03-16 Thread Mattmann, Chris A (388J)
Performance Issues > > > > Thanks Chris! > > > > I'll try the patch. > > > > -Original Message- > > From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov] > > Sent: Tuesday, March 16, 2010 5:37 PM > > To: so

Re: PDFBox/Tika Performance Issues

2010-03-16 Thread Mattmann, Chris A (388J)
Guys, I think this is an issue with PDFBOX and the version that Tika 0.6 depends on. Tika 0.7-trunk upgraded to PDFBox 1.0.0 (see [1]), so it may include a fix for the problem you're seeing. See this discussion [2] on how to patch Tika to use the new PDFBox if you can't wait for the 0.7 release

Re: Indexing the latests MS Office documents

2010-01-03 Thread Mattmann, Chris A (388J)
Hi Roland, You probably want to send your email to tika-u...@lucene.apache.org. Best of luck! Cheers, Chris On 1/3/10 4:00 PM, "Roland Villemoes" wrote: > Hi All, > > Anyone who knows how to index the latest MS office documents like .docx and > .xlsx ? > > From searching it seems like Ti

Re: Requesting feedback on solr-spatial plugin

2009-12-30 Thread Mattmann, Chris A (388J)
Hi Mat, Taking a quick look at your code via the gitHub browser (and not having downloaded or run it, that's for later! :) ), it looks _very_ clean, and well commented. Bravo! If you get a chance and are interested in participating in the SOLR spatial effort, there are a few issues you could take

Re: UI for solr core admin?

2009-12-09 Thread Mattmann, Chris A (388J)
VwR in solrconfig, which I'll set up by > default in 1.5). It will even default to the template named after the > handler name, so all you have to do is &wt=velocity. > > Erik > > > > On Dec 10, 2009, at 7:33 AM, Mattmann, Chris A (388J) wrote: &

Re: UI for solr core admin?

2009-12-09 Thread Mattmann, Chris A (388J)
Hi Jason, Patches welcome, though! :) Cheers, Chris On 12/9/09 10:31 PM, "Shalin Shekhar Mangar" wrote: > On Thu, Dec 10, 2009 at 11:52 AM, Jason Rutherglen < > jason.rutherg...@gmail.com> wrote: > >> I assume there isn't one? Anything in the works? >> > > Nope. > > -- > Regards, > Shal

Re: why is XMLWriter declared as final?

2009-11-29 Thread Mattmann, Chris A (388J)
Hi Hoss, What I meant by the sentence is actually it's a good thing to work on it now because SOLR is in "dev" stage, and not in "pre-release" or "feature freeze" state, as indicated by the *-dev on the release #... Cheers, Chris On 11/29/09 3:55 PM, "Chris Hostetter" wrote: I don't really u

Re: why is XMLWriter declared as final?

2009-11-25 Thread Mattmann, Chris A (388J)
Hey Hoss, +1. I think we need to overhaul the whole API, even in light of the incremental progress I've been proposing and patching, etc., lately. I think it's good to do that incrementally, though, rather than all at once, especially considering SOLR is in 1.5-dev trunk stage atm. Cheers, Chr

Re: Newbie tips: migrating from mysql fulltext search / PHP integration

2009-11-15 Thread Mattmann, Chris A (388J)
WOW, +1!! Great job, PHP! Cheers, Chris On 11/15/09 10:13 PM, "Otis Gospodnetic" wrote: Hi, I'm not sure if you have a specific question there. But regarding "PHP integration" part, I just learned PHP now has native Solr (1.3 and 1.4) support: http://twitter.com/otisg/status/5757184282

Re: Response XML Deserializing

2009-10-27 Thread Mattmann, Chris A (388J)
Hi Thomas, If you check out SOLR-1516, I developed a custom response writer that simplifies this process. You basically have to implement an #emitDoc or an #emitDocList function in which you are handed the resultant o.a.l.Document List or o.a.l.Document object (on a per Document basis) and you