So now, vs when 'trie' came out, Solr has an INT field that IS 'trie', right?

And nothing date/timestamp related has come out since, making 'trie'/INT the 
field of choice for timestamps, right? 

Seems like the fastest choice.

I will have to read up on it.

Seems like my original choice to use unix timestamp as storage in my SQL 
database, vs native Postgres timestamp, will make everything easier between:
  PHP
  Symfony
  Postgres
  Solr

It's probably going to be a good idea to store two other columns in the search 
index for display, 'date', 'time'. That is, unless I force the user's 
javascript to generate the time and date from the unix timestamp. hmmmmmm.
  
Dennis Gearon

Signature Warning
----------------
EARTH has a Right To Life,
  otherwise we all die.

Read 'Hot, Flat, and Crowded'
Laugh at http://www.yert.com/film.php


--- On Wed, 9/8/10, Jonathan Rochkind <rochk...@jhu.edu> wrote:

> From: Jonathan Rochkind <rochk...@jhu.edu>
> Subject: Re: How to import data with a different date format
> To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
> Date: Wednesday, September 8, 2010, 11:35 AM
> So the standard 'int' field in Solr
> 1.4 is a "trie based" field, although the example "int" type
> in the default solrconfig.xml has a "precision" set to 0,
> which means it's not really doing "trie" things. If you set
> the precision to something greater than 0, as in the default
> example "tint" type, then it's really using 'trie'
> functionality.  'trie' functionality speeds up range
> queries by putting each value into 'buckets' (my own term),
> per the precision specified, so solr has to do less to grab
> all values within a certain range.
> 
> That's all tint/non-zero-precision-trie does, speed up
> range queries. Your use case involves range queries though,
> so it's worth investigating.  If you use a string or
> other textual type for sorting or range queries, you need to
> make sure your values sort the way you want them to as
> strings. But yyyy-mm-dd will.
> 
> More on trie: 
> http://www.lucidimagination.com/blog/2009/05/13/exploring-lucene-and-solrs-trierange-capabilities/
> 
> I think there probably won't be much of a difference at
> query time between non-trie int and string, although I'm not
> sure, and it may depend on the nature of your data and
> queries.   Using a trie int will be faster
> for (and only for) range queries, if you have a lot of data.
> (There are some cases, depending on the data and the nature
> of your queries, where the overhead of a non-zero-precision
> trie may outweigh the hypothetical gain, but generally it's
> faster). 
> I don't think there should be any appreciable difference
> between how long a non-trie int or a string will take to
> index -- at least as far as solr is concerned, if your app
> preparing the documents for solr takes longer to prepare one
> than another, that's another story. An actual trie
> (non-zero-precision) theoretically has indexing-time
> overhead, but I doubt it would be noticeable, unless you
> have a really really lean mean indexing setup where ever
> microsecond counts.
> 
> Jonathan
> 
> Dennis Gearon wrote:
> > I'm doing something similar for
> dates/times/timestamps.
> > 
> > I'm actually trying to do, "'now' is within the range
> of what appointments(date/time from and to combos, i.e.
> timestamps).
> > 
> > Fairly simple search of:
> > 
> >    What items have a start time BEFORE now,
> and an end time AFTER now?
> > 
> > My thoughts were to store:
> >   unix time stamp BIGINTS (64 bit)
> >   "ISO_DATE ISO_TIME" strings
> > 
> > Which is going to be faster:
> >    1/ Indexing?
> >    2/ Searching?
> > 
> > How does the 'tint' field mentioned below apply?
> > 
> > 
> > 
> > Dennis Gearon
> > 
> > Signature Warning
> > ----------------
> > EARTH has a Right To Life,
> >   otherwise we all die.
> > 
> > Read 'Hot, Flat, and Crowded'
> > Laugh at http://www.yert.com/film.php
> > 
> > 
> > --- On Wed, 9/8/10, Jonathan Rochkind <rochk...@jhu.edu>
> wrote:
> > 
> >   
> >> From: Jonathan Rochkind <rochk...@jhu.edu>
> >> Subject: Re: How to import data with a different
> date format
> >> To: "solr-user@lucene.apache.org"
> <solr-user@lucene.apache.org>
> >> Date: Wednesday, September 8, 2010, 10:27 AM
> >> Just throwing it out there, I'd
> >> consider a different approach for an actual real
> app,
> >> although it might not be easier to get up quickly.
> (For
> >> quickly, yeah, I'd just store it as a string, more
> on that
> >> at bottom).
> >> 
> >> If none of your dates have times, they're all just
> full
> >> days, I'm not sure you really need the date type
> at all.
> >> 
> >> Convert the date to number-of-days since epoch
> >> integer.  (Most languages will have a way to
> do this,
> >> but I don't know about pure XSLT).  Store
> _that_ in a
> >> 1.4 'int' field.  On top of that, make it a
> "tint"
> >> (precision non-zero) for faster range queries.
> >> 
> >> But now your actual interface will have to convert
> from
> >> "number of days since epoch" to a displayable
> date. (And if
> >> you allow user input, convert the input to
> >> number-of-days-since-epoch before making a range
> query or
> >> fq, but you'd have to do that anyway even with
> solr dates,
> >> users aren't going to be entering W3CDate raw, I
> don't
> >> think).
> >> 
> >> That is probably the most efficient way to have
> solr handle
> >> it -- using an actual date field type gives you a
> lot more
> >> precision than you need, which is going to hurt
> performance
> >> on range queries. Which you can compensate for
> with trie
> >> date sure, but if you don't really need that
> precision to
> >> begin with, why use it?  Also the extra
> precision can
> >> end up doing unexpected things and making it
> easier to have
> >> bugs (range queries on that high precision stuff,
> you need
> >> to make sure your start date has 00:00:00 set and
> your end
> >> date has 23:59:59 set, to do what you probably
> expect). If
> >> you aren't going to use the extra precision,
> makes
> >> everything a lot simpler to not use a date field.
> >> 
> >> Alternately, for your "get this done quick"
> method, yeah,
> >> I'd just store it as a string. With a string
> exactly as
> >> you've specified, sorting and range queries won't
> work how
> >> you'd want.  But if you can make it a string
> of the
> >> format "yyyy/mm/dd" instead (always two-digit
> month and
> >> year), then you can even sort and do range queries
> on your
> >> string dates. For the quick and dirty prototype,
> I'd just do
> >> that.  In fact, while this might make range
> queries and
> >> sorting _slightly_ slower than if you use an int
> or a tint,
> >> this might really be good enough even for a real
> app (hey,
> >> it's what lots of people did before the trie-based
> fields
> >> existed).
> >> 
> >> Jonathan
> >> 
> >> Erick Erickson wrote:
> >>     
> >>> I think Markus is spot-on given the fact that
> you have
> >>>       
> >> 2 days. Using a
> >>     
> >>> string field is quickest.
> >>> 
> >>> However, if you absolutely MUST have
> functioning
> >>>       
> >> dates, there are three
> >>     
> >>> options I can think of:
> >>> 1> can you make your XSLT transform the
> dates?
> >>>       
> >> Confession; I'm XSLT-ignorant
> >>     
> >>> 2> use DIH and DateTransformer, see:
> >>> http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer
> >>>        you can walk a
> >>>       
> >> directory importing all the XML files with
> >>     
> >>> FileDataSource.
> >>> <http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer>3>
> >>>       
> >> you
> >>     
> >>> could write a program to do this manually.
> >>> 
> >>> But given the time constraints, I suspect your
> time
> >>>       
> >> would be better spent
> >>     
> >>> doing the other stuff and just using string as
> per
> >>>       
> >> Markus. I have no clue
> >>     
> >>> how SOLR-savvy you are, so pardon if this is
> something
> >>>       
> >> you already know. But
> >>     
> >>> lots of people trip up over the "string" field
> type,
> >>>       
> >> which is NOT tokenized.
> >>     
> >>> You usually want "text" unless it's some sort
> of
> >>>       
> >> ID.... So it might be worth
> >>     
> >>> it to do some searching earlier rather than
> later
> >>>       
> >> <G>....
> >>     
> >>> Best
> >>> Erick
> >>> 
> >>> On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma
> <markus.jel...@buyways.nl>wrote:
> >>> 
> >>>          
> >>>> No. The Datefield [1] will not accept it
> any other
> >>>>         
> >> way. You could, however,
> >>     
> >>>> fool your boss and dump your dates in an
> ordinary
> >>>>         
> >> string field. But then you
> >>     
> >>>> cannot use some of the nice date
> features.
> >>>> 
> >>>> 
> >>>> 
> >>>> [1]:
> >>>> http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html
> >>>> 
> >>>> -----Original message-----
> >>>> From: Rico Lelina <rlel...@yahoo.com>
> >>>> Sent: Wed 08-09-2010 17:36
> >>>> To: solr-user@lucene.apache.org;
> >>>> Subject: How to import data with a
> different date
> >>>>         
> >> format
> >>     
> >>>> Hi,
> >>>> 
> >>>> I am attempting to import some of our data
> into
> >>>>         
> >> SOLR. I did it the quickest
> >>     
> >>>> way
> >>>> I know because I literally only have 2
> days to
> >>>>         
> >> import the data and do some
> >>     
> >>>> queries for a proof-of-concept.
> >>>> 
> >>>> So I have this data in XML format and I
> wrote a
> >>>>         
> >> short XSLT script to
> >>     
> >>>> convert it
> >>>> to the format in solr/example/exampledocs
> (except
> >>>>         
> >> I retained the element
> >>     
> >>>> names
> >>>> so I had to modify schema.xml in the conf
> >>>>         
> >> directory. So far so good -- the
> >>     
> >>>> import works and I can search the data.
> One of my
> >>>>         
> >> immediate problems is
> >>     
> >>>> that
> >>>> there is a date field with the format
> MM/DD/YYYY.
> >>>>         
> >> Looking at schema.xml, it
> >>     
> >>>> seems SOLR accepts only full date fields
> --
> >>>>         
> >> everything seems to be
> >>     
> >>>> mandatory
> >>>> including the Z for Zulu/UTC time
> according to the
> >>>>         
> >> doc. Is there a way to
> >>     
> >>>> specify the date format?
> >>>> 
> >>>> Thanks very much.
> >>>> Rico
> >>>> 
> >>>> 
> >>>>           
>   
> >>>          
> > 
> >   

Reply via email to