Alexandre:

Honest, I looked for that but was in a rush and couldn't find it and
thought I was remembering something _else_.

That's definitely a better approach, thanks! Perhaps this time I'll
remember....

Erick

On Mon, Sep 22, 2014 at 3:23 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> You could try - for your ideal scenario - creating an
> UpdateRequestProcessor (URP) chain, that
> includes:ParseDateFieldUpdateProcessorFactory
>
> https://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/update/processor/ParseDateFieldUpdateProcessorFactory.html
>
> Notice that it has been designed for dynamic field scenario, so by
> default it looks at everything and tries to make it a date. But its
> parent class has some parameters to specify specific fields to use:
>
> https://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html
>
> You can see an example in the schemaless config example:
>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1584
>
> Just remember that when you are creating a URP chain:
> 1) You need to keep two (or three) of the update request processor in
> the chain, not just your date one. The details are here:
> https://wiki.apache.org/solr/UpdateRequestProcessor . The example
> above uses three, to deal with cloud situation
> 2) You need to refer to that chain in the request handler to make sure
> it is actually used:
>
> https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_0/solr/example/example-schemaless/solr/collection1/conf/solrconfig.xml#L1014
>
> I THINK this should work and it would classify under configuration not
> customization and definitely not programming.
>
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 22 September 2014 16:16, Manohar Kanuri <s...@kanuri.org> wrote:
> > Hello,
> >
> > I am a non-techie who decided to download and install Solr 5.0 to parse
> data  for my community activism. Got it installed and running, updated the
> example schema and installation with a bunch of CSV data. And went back to
> deal with the first of two fields I deferred till later - dates and
> location data.
> >
> > The CSV data file for Jan - August 2014 is about 650mb with about 1.25
> million records/rows. I split it into 5 pieces and went changed MM/DD/YYYY
> HH:MM:SS AM/PM to the YYYY-MM-DDTHH:MM:SSZ format required by Solr, using
> TextWrangler. Which is what I know and a step up from trying to use Mac
> Numbers spreadsheet which does it very easily but I will have to break it
> into pieces smaller than 25-30mb. Random fields can get updated months
> after the record was created so I have to find an easier way than break the
> CSV file into smaller bits and reformat manually. Each record/row has 4
> date fields so potentially there are upto 5 million fields to be
> reformatted in 8 months worth of data..
> >
> > I did a Google search (didn't see a Solr search page) on the mailing
> list archives and the internet, but seems like my question is either too
> simple and/or it's staring me in the face and I'm just missing it:  Is
> there a simple way to reformat the dates to Solr-style in a 650mb-1gig CSV
> file? Or, ideally, have the dates and times automatically reformatted as
> the Solr index gets updated the latest data (I recall reading this was not
> possible). Is there a widget/gadget/gizmo/script that would do this?
> >
> > thanks,
> > manohar
>

Reply via email to