I also commit too many I guess, since we have 1000 folders, so each loop will 
executed the load and commit.
So 1000 loops with 1000 commits. I think it will be help if I only commit once 
after the 1000 loops completed.

Any inputs?

Thhanks

Francis


-----Original Message-----
From: Francis Yakin [mailto:fya...@liquid.com]
Sent: Thursday, July 09, 2009 11:13 PM
To: 'solr-user@lucene.apache.org'; 'noble.p...@gmail.com'
Subject: RE: Using curl comparing with using WebService::Solr

Yes, the xml files are in complete add format.

This is my code:

#!/usr/bin/perl


   if (($#ARGV + 1) <= 0 ) {
        print "Usage: perl prod.pl <dir> \n\n";
        exit(1);
   }


###### -- CHANGE accordingly
   $timeout = 300;
   $topdir = "/opt/Test/xml-file/";
   #$topdir = "/opt/Test/";
   $dir = $topdir . $ARGV[0];
   $commit_dir = "/opt/commit";

#####


   $curl="/usr/bin/curl";
   print "Loading xml files in $dir in progress \n";
   opendir(BIN, $dir) or die "Can't open $dir: $!";
   $commitCmd = '(cd ' . $commit_dir . '; ' . '/usr/bin/curl 
http://localhost:7001/solr/update --data-binary @commit.txt -H 
\'Content-type:text/plain; charset=utf-8\')';



   while( defined ($file = readdir BIN) ) {

   next if $file =~ /^\./;
   $insertCmd = '(cd ' . $dir   . '; ' . '/usr/bin/curl 
http://localhost:7001/solr/update --data-binary @' .  $file . ' -H 
\'Content-type:text/plain; charset=utf-8\')';

   system($insertCmd);


  }

  system($commitCmd);
closedir(BIN);

Thanks

Francis

-----Original Message-----
From: noble.p...@gmail.com [mailto:noble.p...@gmail.com] On Behalf Of Noble 
Paul ??????? ??????
Sent: Thursday, July 09, 2009 10:40 PM
To: solr-user@lucene.apache.org
Subject: Re: Using curl comparing with using WebService::Solr

are these xml files in the solr add xml format?

When you post using curl, I guess it opens as many http connections as
there are files. if you can write a small program to post all these
files in one request, you should be able to get better perf.

the following can be the pseudo-code

open connection
write "<root>"
for each file
  write filecontent
write "</root>"
close connection




On Fri, Jul 10, 2009 at 10:23 AM, Francis Yakin<fya...@liquid.com> wrote:
>
> I have about 1000 folders, each folder consist 2581 xml files. Total of xml 
> files is ~ 2.6 millions
>
> I developed perl script, inside my script it's executed this cmd:
>
>  curl http://localhost:7001/solr/update --data-binary "@0039000.xml" -H 
> 'Content-type:text/plain; charset=utf-8'
>
> It tooks me about 4 1/2 hrs to load and commit.
>
> I would like to know the advantages using curl to posting/add/update the xml 
> files to solr comparing with using WebService::Solr module?
>
> Is using WebService::Solr faster?
>
> The XML files are local on the Solr Master box, so I posting it locally( not 
> using wan or lan).
>
> Any input will be much appreciated.
>
> Thanks
>
> Francis
>
>
>



--
-----------------------------------------------------
Noble Paul | Principal Engineer| AOL | http://aol.com

Reply via email to