Hi Amir,
One thing you may want to look at is the Apache OODT File Manager here. If
you can model your content objects as Apache OODT "product types" you can
then use the parent/child relationship model there to achieve what you are
after. Then if you use the SolrDumperTool from the Apache OODT fi
Hi Guys,
I also worked on a CombiningTokenFilter, see:
https://issues.apache.org/jira/browse/LUCENE-3413
Patch has been up and available for a while.
HTH!
Cheers,
Chris
On 12/27/12 12:26 AM, "Dmitry Kan" wrote:
>Hi,
>
>Have a look onto TokenFilter. Extending it will give you access to a
>
Hi there -- you may want to post this to the d...@tika.apache.org list.
Cheers,
Chris
On 12/11/12 11:08 AM, "eShard" wrote:
>I'm running Solr 4.0 on Tomcat 7.0.8 and I'm running the solr/example
>single
>core as well with manifoldcf v1.1
>I had everything working but then the crawler stops and
Hi Arkadi,
You may want to post this on the u...@tika.apache.org list -- looks like
you are missing the univerisalchardetector library as part of your Solr
Cell installation.
Cheers,
Chris
On 12/6/12 12:02 AM, "Arkadi Colson" wrote:
>Anybody an idea?
>
>Dec 5, 2012 3:52:32 PM org.apache.solr.c
Hi there,
You may want to check out:
SOLR-2073 Geonames.org UpdateProcessor for Spatial
SOLR-2074 GeoRSS ResponseWriter
SOLR-2075 SpatialQParserPlugin and HostIP adaptor
SOLR-2076 Spatial example schema updates
SOLR-2077 Spatial example solconfig updates
SOLR-2079 Expose HttpServletRequest object
Hi,
You can also check out LUCENE-3413 [1] and the CombiningFilter that
I wrote and associated example. This lets you:
1. perform normal tokenization and analysis in your analysis chain
2. recombine the tokens at the end for sorting purposes
HTH,
Chris
[1] https://issues.apache.org/jira/browse
On Sep 2, 2011, at 8:53 PM, Mattmann, Chris A (388J) wrote:
>
> I think in spelling this out though, I might have elaborated my problem.
> Since
> the method I call in the constructor for my CombiningFilter is
> super(mergeStreamTokens(in))
> where mergeStreamTokens is
Hi Yonik,
On Sep 2, 2011, at 7:47 PM, Yonik Seeley wrote:
> On Fri, Sep 2, 2011 at 10:26 PM, Mattmann, Chris A (388J)
> wrote:
>> I'm left with childrenshospitallosangeles as a single token resultant from
>> the chain.
>> So, when I go to sort the titles in Solr, I
Hi Everyone,
I've got an Analysis question related to both Lucene and Solr (sorry for the
cross posting).
i've created a custom analysis chain part of a field type for the title field
in my schema representing Businesses.
I've created an addition field called title_sort where I copied the orig
Hi Tod,
On Jun 22, 2011, at 6:00 AM, Tod wrote:
>> Mattmann, Chris A (388J jpl.nasa.gov> writes:
>>
>>>>
>>>> Hi Jo,
>>>>
>>>> You may consider checking out Tika trunk, where we recently have a Tika
>>>> JAX-RS
>
Glad it worked out!
Cheers,
Chris
On Jun 22, 2011, at 5:14 AM, Surendra wrote:
> Hi Chris ,Andreas
>
> I have upgraded to solr 3.2 ... everything seems fine now. I will have to
> integrate this to my application and observe if any further issues...again
> thanks for your patience and time...
>
Hi Surendra,
Thanks. Besides replacing the tika-*-0.9.jar files, you also need to replace
the dependency jar files for the other libs as well since they have been
upgraded. It's also possible that b/c of API changes, Solr 1.4.1 won't work
with Tika 0.9 without modifying the ExtractingRequestHan
Hi Surendra,
On Jun 20, 2011, at 4:59 AM, Surendra wrote:
> Hey Chris
>
> I have added tika-core 0.9 and tika-parsers 0.9 to Solr1.4.1 (extraction/lib)
> after building them using the source provided by TIKA. Now I have an issue
> with
> this. I am working with extracting PDF content using Solr
Hey Adam,
I haven't done GeoJSON, but I did whip up a GeoRSS one, check it out here:
https://issues.apache.org/jira/browse/SOLR-2074
Cheers,
Chris
On May 29, 2011, at 11:14 AM, Adam Estrada wrote:
> All,
>
> Has anyone modified the current json response writer to include the GeoJSON
> geospat
ertise p a group of like minded individuals.
Cheers,
Chris
>
> ~ David
> ________
> From: Mattmann, Chris A (388J) [chris.a.mattm...@jpl.nasa.gov]
> Sent: Tuesday, March 29, 2011 1:00 AM
> To: solr-user@lucene.apache.org
> Cc: Adam Estrada
&
LGPL licenses and Apache aren't exactly compatible, see:
http://www.apache.org/legal/3party.html#transition-examples-lgpl
http://www.apache.org/legal/resolved.html#category-x
In practice, this was the reason we started the SIS project.
Cheers,
Chris
On Mar 28, 2011, at 11:16 AM, Smiley, David W
Hi Andreas,
> java -jar tika-app-0.9.jar --list-met-models
> TikaMetadataKeys
> PROTECTED
> RESOURCE_NAME_KEY
> TikaMimeKeys
> MIME_TYPE_MAGIC
> TIKA_MIME_FILE
>
> Both 0.8 and 0.9 give me the same list. Is that a configuration issue?
Strange -- those are the only met models you're seeing liste
Yep it's fixed in 0.9.
Cheers,
Chris
On Feb 25, 2011, at 2:37 PM, Andreas Kemkes wrote:
> According to the Tika release notes, it's fixed in 0.9. Haven't tried it
> myself.
>
> A critical backwards incompatible bug in PDF parsing that was introduced in
> Tika
> 0.8 has been fixed. (TIKA-548
Hi Andreas,
In Tika 0.8+, you can run the --list-met-models command from tika-app:
java -jar tika-app-.jar --list-met-models
And get a print out of the met keys that Tika supports. Some parsers add their
own that aren't part of this met listing, but this is a relatively
comprehensive list.
Ch
Hi Jo,
You may consider checking out Tika trunk, where we recently have a Tika JAX-RS
web service [1] committed as part of the tika-server module. You could probably
wire DIH into it and accomplish the same thing.
Cheers,
Chris
[1] https://issues.apache.org/jira/browse/TIKA-593
On Feb 24, 201
+1 to David's patch from SOLR-2155.
It would be great to implement. Great job using GDAL on converting the WKT Adam!
Cheers,
Chris
On Feb 8, 2011, at 8:18 PM, Adam Estrada wrote:
> I just came across a ~nudge post over in the SIS list on what the status is
> for that project. This got me look
Hi Guys,
It depends on what properties you're trying to maximize. I've done several
studies of this over the years:
http://sunset.usc.edu/~mattmann/pubs/MSST2006.pdf
http://sunset.usc.edu/~mattmann/pubs/IWICSS07.pdf
http://sunset.usc.edu/~mattmann/pubs/icse-shark08.pdf
And if you're really bore
I'd be happy to comment:
A simple shell script doesn't provide URL filtering and control of how you
crawl those documents on the local file system. Nutch has several levels of URL
filtering based on regex, MIME type, and others. Also, if there are any
outlinks in those local files that point to
(apologies for the cross posting)
*** PLEASE NOTE - the deadline for submitting papers has been extended by 1
week to 1/28/2011! ***
Please consider submitting a paper to the ICSE 2011 Software Engineering for
Cloud Computing (SECLOUD) Workshop to be held Sunday, May 22, 2011, at the
Hilton Ha
On Jan 18, 2011, at 2:24 PM, Glen Newton wrote:
> Where do you get your Lucene/Solr downloads from?
>
> [] ASF Mirrors (linked in our release announcements or via the Lucene website)
>
> [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
>
> [] I/we build them from source via a
Hey Adam,
The Solr/Lucene people decided that with versions post the Solr/Lucene merge
that they would sync the Solr version with the Lucene version (meaning there is
no 1.5 release or plans for it, which IMHO as I noted before is confusing).
Here's a pointer to a former discussion thread about
(apologies for the cross posting)
Please consider submitting a paper to the ICSE 2011 Software Engineering for
Cloud Computing (SECLOUD) Workshop to be held Sunday, May 22, 2011, at the
Hilton Hawaiian Village Resort in Waikiki, Honolulu, HI.
This workshop focuses on identifying the grand chall
Hi Adam,
I cut a branch at Github of a forked Solr 1.5 that applied a bunch of patches
that my student and I did in my CSCI 572 class at USC. The branch is here [1].
You can simply use Git to go grab that if you want and not worry about the
patches either. I sent an email [2] and filed a bunch
Oops, forgot to include solr-user@ in the original email. FYI below...
-- Forwarded Message
From: "Mattmann, Chris A (388J)"
Reply-To:
Date: Tue, 24 Aug 2010 07:02:58 -0700
To:
Subject: [Spatial] Geonames and extension to Spatial Solution for Solr
Hi Folks,
You may have notice
Hi Savannah,
Check out the patches I just threw up for SOLR-2073, SOLR-2074, SOLR-2075,
SOLR-2076 and SOLR-2077.
There's code in there to deal with Geonames.org data. There's more patches
coming so hopefully it will be clearer as I add them. Thanks to W. Quach for
leading the charge on these p
It might have something to do with the source data and its spatial reference
system. For example, if the data is in WGS84 then the haversine (great
circle) distance precision gets worse the farther away two cities are from
each other or for particular regions (e.g. further away from equator).
Chee
Hi Joe,
Take a look at the Cartesian Grid work from Patrick O'Leary here [1]. It's not
fully integrated with Solr and they are moving away from it, but it'll give you
a good idea of how to get started and to go about doing this...
HTH,
Chris
[1] http://www.nsshutdown.com/projects/lucene/whitep
All, FYI, as SolrCell is built on top of Tika, some folks might be interested
in this message I posted to the Tika lists.
Thanks!
Cheers,
Chris
-- Forwarded Message
From: "Mattmann, Chris A (388J)"
Reply-To:
Date: Fri, 11 Jun 2010 19:07:24 -0700
To:
Cc:
Subject: Tika in
Hey Guys,
There is some work on SOLR-17 to track this. I put up a patch that's
incomplete, based on the prior work done by Mike Baranczak and Hoss and others.
I've been meaning to get back to it, but have been swamped.
Contributions/updates welcome!
Cheers,
Chris
[1] http://issues.apache.org/
Hi Stefan,
I'm not sure about releasing one for 1.4, but I was making some progress on
SOLR-17 [1] (and see the linked issues from there) on pushing this forward. It
dropped off my radar for a while, but I'd be happy to pick it up back up.
Let me see what I can do.
Thanks!
Cheers,
Chris
[1]
Hey All,
Just to save some folks some time in case you are trying to get new
Lucene/Solr up in running in Eclipse. If you continue to get weird errors,
e.g., in solr/src/test/TestConfig.java regarding
org.w3c.dom.Node#getTextContent(), I found for me this error was caused by
including the Tidy.jar
Hey Erik,
One thing to think about (and I'm no expert at middle school kids) would be
to relate search somehow to a topic they are interested in. My 12 year old
nephew loves the NBA, so if I were to talk to him about search, I would try
and relate it to e.g., NBA.com, or understanding the differen
Hi Giovanni,
The error that you're showing in your logs below indicates that this message
signature:
org.apache.solr.handler.ContentStreamLoader.load(Lorg/apache/solr/request/SolrQueryRequest;Lorg/apache/solr/response/SolrQueryResponse;Lorg/apache/solr/common/util/ContentStream;)
doesn't match
that works...
-Original Message-----
From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov]
Sent: Friday, March 19, 2010 12:04 AM
To: solr-user@lucene.apache.org
Subject: Re: PDFBox/Tika Performance Issues
Hi Giovanni,
Let's try and isolate the problem. Can you try parsing th
3.6.jar
poi-ooxml-3.6.jar
poi-ooxml-schemas-3.6.jar
poi-scratchpad-3.6.jar
tagsoup-1.2.jar
tika-core-0.7-SNAPSHOT.jar
tika-parsers-0.7-SNAPSHOT.jar
xercesImpl-2.8.1.jar
xml-apis-1.0.b2.jar
xmlbeans-2.3.0.jar
-Original Message-
From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.go
Performance Issues
>
>
>
> Thanks Chris!
>
>
>
> I'll try the patch.
>
>
>
> -Original Message-
>
> From: Mattmann, Chris A (388J) [mailto:chris.a.mattm...@jpl.nasa.gov]
>
> Sent: Tuesday, March 16, 2010 5:37 PM
>
> To: so
Guys, I think this is an issue with PDFBOX and the version that Tika 0.6
depends on. Tika 0.7-trunk upgraded to PDFBox 1.0.0 (see [1]), so it may
include a fix for the problem you're seeing.
See this discussion [2] on how to patch Tika to use the new PDFBox if you can't
wait for the 0.7 release
Hi Roland,
You probably want to send your email to tika-u...@lucene.apache.org.
Best of luck!
Cheers,
Chris
On 1/3/10 4:00 PM, "Roland Villemoes" wrote:
> Hi All,
>
> Anyone who knows how to index the latest MS office documents like .docx and
> .xlsx ?
>
> From searching it seems like Ti
Hi Mat,
Taking a quick look at your code via the gitHub browser (and not having
downloaded or run it, that's for later! :) ), it looks _very_ clean, and
well commented. Bravo!
If you get a chance and are interested in participating in the SOLR spatial
effort, there are a few issues you could take
VwR in solrconfig, which I'll set up by
> default in 1.5). It will even default to the template named after the
> handler name, so all you have to do is &wt=velocity.
>
> Erik
>
>
>
> On Dec 10, 2009, at 7:33 AM, Mattmann, Chris A (388J) wrote:
&
Hi Jason,
Patches welcome, though! :)
Cheers,
Chris
On 12/9/09 10:31 PM, "Shalin Shekhar Mangar" wrote:
> On Thu, Dec 10, 2009 at 11:52 AM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> I assume there isn't one? Anything in the works?
>>
>
> Nope.
>
> --
> Regards,
> Shal
Hi Hoss,
What I meant by the sentence is actually it's a good thing to work on it now
because SOLR is in "dev" stage, and not in "pre-release" or "feature freeze"
state, as indicated by the *-dev on the release #...
Cheers,
Chris
On 11/29/09 3:55 PM, "Chris Hostetter" wrote:
I don't really u
Hey Hoss,
+1. I think we need to overhaul the whole API, even in light of the incremental
progress I've been proposing and patching, etc., lately.
I think it's good to do that incrementally, though, rather than all at once,
especially considering SOLR is in 1.5-dev trunk stage atm.
Cheers,
Chr
WOW, +1!! Great job, PHP!
Cheers,
Chris
On 11/15/09 10:13 PM, "Otis Gospodnetic" wrote:
Hi,
I'm not sure if you have a specific question there.
But regarding "PHP integration" part, I just learned PHP now has native Solr
(1.3 and 1.4) support:
http://twitter.com/otisg/status/5757184282
Hi Thomas,
If you check out SOLR-1516, I developed a custom response writer that
simplifies this process. You basically have to implement an #emitDoc or an
#emitDocList function in which you are handed the resultant o.a.l.Document List
or o.a.l.Document object (on a per Document basis) and you
50 matches
Mail list logo