First thing is it looks like you're only sending one document at a
time, perhaps with child objects. This is not optimal at all. I
usually batch my docs up in groups of 1,000, and there is anecdotal
evidence that there may (depending on the docs) be some gains above
that number. Gotta balance the batch size off against how bug the docs
are of course.

Assuming that you really are calling this method for one doc (and
children) at a time, the far bigger problem other than calling
server.add for each parent/children is that you're then calling
solr.commit() every time. This is an anti-pattern. Generally, let the
autoCommit setting in solrconfig.xml handle the intermediate commits
while the indexing program is running and only issue a commit at the
very end of the job if at all.

Best,
Erick

On Sun, Jul 19, 2015 at 12:08 PM, Vineeth Dasaraju
<vineeth.ii...@gmail.com> wrote:
> Hi,
>
> I am trying to index JSON objects (which contain nested JSON objects and
> Arrays in them) into solr.
>
> My JSON Object looks like the following (This is fake data that I am using
> for this example):
>
> {
>     "RawEventMessage": "Lorem ipsum dolor sit amet, consectetur adipiscing
> elit. Aliquam dolor orci, placerat ac pretium a, tincidunt consectetur
> mauris. Etiam sollicitudin sapien id odio tempus, non sodales odio iaculis.
> Donec fringilla diam at placerat interdum. Proin vitae arcu non augue
> facilisis auctor id non neque. Integer non nibh sit amet justo facilisis
> semper a vel ligula. Pellentesque commodo vulputate consequat. ",
>     "EventUid": "1279706565",
>     "TimeOfEvent": "2015-05-01-08-07-13",
>     "TimeOfEventUTC": "2015-05-01-01-07-13",
>     "EventCollector": "kafka",
>     "EventMessageType": "kafka-@column",
>     "User": {
>         "User": "Lorem ipsum",
>         "UserGroup": "Manager",
>         "Location": "consectetur adipiscing",
>         "Department": "Legal"
>     },
>     "EventDescription": {
>         "EventApplicationName": "",
>         "Query": "SELECT * FROM MOVIES",
>         "Information": [
>             {
>                 "domainName": "English",
>                 "columns": [
>                     {
>                         "movieName": "Casablanca",
>                         "duration": "154",
>                     },
>     {
>                         "movieName": "Die Hard",
>                         "duration": "127",
>                     }
>                 ]
>             },
>             {
>                 "domainName": "Hindi",
>                 "columns": [
>                     {
>                         "movieName": "DDLJ",
>                         "duration": "176",
>                     }
>                 ]
>             }
>         ]
>     }
> }
>
>
>
> My function for indexing the object is as follows:
>
> public static void indexJSON(JSONObject jsonOBJ) throws ParseException,
> IOException, SolrServerException {
>     Collection<SolrInputDocument> batch = new
> ArrayList<SolrInputDocument>();
>
>     SolrInputDocument mainEvent = new SolrInputDocument();
>     mainEvent.addField("id", generateID());
>     mainEvent.addField("RawEventMessage", jsonOBJ.get("RawEventMessage"));
>     mainEvent.addField("EventUid", jsonOBJ.get("EventUid"));
>     mainEvent.addField("EventCollector", jsonOBJ.get("EventCollector"));
>     mainEvent.addField("EventMessageType", jsonOBJ.get("EventMessageType"));
>     mainEvent.addField("TimeOfEvent", jsonOBJ.get("TimeOfEvent"));
>     mainEvent.addField("TimeOfEventUTC", jsonOBJ.get("TimeOfEventUTC"));
>
>     Object obj = parser.parse(jsonOBJ.get("User").toString());
>     JSONObject userObj = (JSONObject) obj;
>
>     SolrInputDocument childUserEvent = new SolrInputDocument();
>     childUserEvent.addField("id", generateID());
>     childUserEvent.addField("User", userObj.get("User"));
>
>     obj = parser.parse(jsonOBJ.get("EventDescription").toString());
>     JSONObject eventdescriptionObj = (JSONObject) obj;
>
>     SolrInputDocument childEventDescEvent = new SolrInputDocument();
>     childEventDescEvent.addField("id", generateID());
>     childEventDescEvent.addField("EventApplicationName",
> eventdescriptionObj.get("EventApplicationName"));
>     childEventDescEvent.addField("Query", eventdescriptionObj.get("Query"));
>
>     obj= JSONValue.parse(eventdescriptionObj.get("Information").toString());
>     JSONArray informationArray = (JSONArray) obj;
>
>     for(int i = 0; i<informationArray.size(); i++){
>         JSONObject domain = (JSONObject) informationArray.get(i);
>
>         SolrInputDocument domainDoc = new SolrInputDocument();
>         domainDoc.addField("id", generateID());
>         domainDoc.addField("domainName", domain.get("domainName"));
>
>         String s = domain.get("columns").toString();
>         obj= JSONValue.parse(s);
>         JSONArray ColumnsArray = (JSONArray) obj;
>
>         SolrInputDocument columnsDoc = new SolrInputDocument();
>         columnsDoc.addField("id", generateID());
>
>         for(int j = 0; j<ColumnsArray.size(); j++){
>             JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j);
>             SolrInputDocument columnDoc = new SolrInputDocument();
>             columnDoc.addField("id", generateID());
>             columnDoc.addField("movieName", ColumnsObj.get("movieName"));
>             columnsDoc.addChildDocument(columnDoc);
>         }
>         domainDoc.addChildDocument(columnsDoc);
>         childEventDescEvent.addChildDocument(domainDoc);
>     }
>
>     mainEvent.addChildDocument(childEventDescEvent);
>     mainEvent.addChildDocument(childUserEvent);
>     batch.add(mainEvent);
>     solr.add(batch);
>     solr.commit();
> }
>
> When I try to index the using the above code, I am able to index only 12
> Objects per second. Is there a faster way to do the indexing? I believe I
> am using the json-fast parser which is one of the fastest parsers for json.
>
> Your help will be very valuable to me.
>
> Thanks,
> Vineeth

Reply via email to