I think this'll help:

http://wiki.apache.org/solr/ScriptUpdateProcessor

Essentially, each time a document comes in to Solr,
this will get invoked on it. You'll have to do some
fiddling to get it right, you have to remove the field from
the doc and transform it then put it back. None of this
is hard, but it'll require a bit of programming. Fortunately
not too much.....

Best,
Erick

On Mon, Sep 22, 2014 at 1:16 PM, Manohar Kanuri <s...@kanuri.org> wrote:
> Hello,
>
> I am a non-techie who decided to download and install Solr 5.0 to parse data  
> for my community activism. Got it installed and running, updated the example 
> schema and installation with a bunch of CSV data. And went back to deal with 
> the first of two fields I deferred till later - dates and location data.
>
> The CSV data file for Jan - August 2014 is about 650mb with about 1.25 
> million records/rows. I split it into 5 pieces and went changed MM/DD/YYYY 
> HH:MM:SS AM/PM to the YYYY-MM-DDTHH:MM:SSZ format required by Solr, using 
> TextWrangler. Which is what I know and a step up from trying to use Mac 
> Numbers spreadsheet which does it very easily but I will have to break it 
> into pieces smaller than 25-30mb. Random fields can get updated months after 
> the record was created so I have to find an easier way than break the CSV 
> file into smaller bits and reformat manually. Each record/row has 4 date 
> fields so potentially there are upto 5 million fields to be reformatted in 8 
> months worth of data..
>
> I did a Google search (didn't see a Solr search page) on the mailing list 
> archives and the internet, but seems like my question is either too simple 
> and/or it's staring me in the face and I'm just missing it:  Is there a 
> simple way to reformat the dates to Solr-style in a 650mb-1gig CSV file? Or, 
> ideally, have the dates and times automatically reformatted as the Solr index 
> gets updated the latest data (I recall reading this was not possible). Is 
> there a widget/gadget/gizmo/script that would do this?
>
> thanks,
> manohar

Reply via email to