Peter Chubb wrote:


Rodolfo> How do you read .docx documents without OOo? Or you install
Rodolfo> OOo libraries and convert the document to other format?

-

Alternatively if you already have a gnome desktop, abiword works most
of the time, but its user interface is modelled on that of Word(TM) (and is
therefore horrible to me).


If you just want the text of the document then unzip the file and convert the document.xml to text.

Below is some (old) code which does this.


In PHP:

   function importDOCX($file_info)
   {
       shell_exec('mkdir /tmp/docx');
shell_exec("cp " . DOC_ROOT . 'documents/' . str_replace(' ', '_', $file_info['file_group']) . '/' . $file_info['sub_folder_path'] . '/' . str_replace(')', '\)', str_replace('(', '\(', str_replace(' ', '\ ', $file_info['filename']) .' /tmp/docx/')));

shell_exec('unzip /tmp/docx/' . str_replace(')', '\)', str_replace('(', '\(', str_replace(' ', '\ ', $file_info['filename']))) . ' -d /tmp/docx/');
       $file = file_get_contents('/tmp/docx/word/document.xml');
       shell_exec('rm -fR /tmp/docx');


       $file = str_replace('<w:', "\n<w:", $file);
       $file = strip_tags($file);
       $file = str_replace('&amp;', '&', $file);
       $out = '';
       $arr = explode("\n", $file);
       foreach($arr as $line)
       {
           if(trim($line) != '')
           {
               $out .= $line . "\n";
               $c = 1;
           } else {
               $c++;
               if($c == 2)
                   $out .= "\n";
           }
       }

       return $out;
}


Also you can use antiword from the command line:


   function importDOC($file_info)
   {
return shell_exec("/usr/bin/antiword " . DOC_ROOT . 'documents/' . str_replace(' ', '_', $file_info['file_group']) . '/' . $file_info['sub_folder_path'] . '/' . str_replace(')', '\)', str_replace('(', '\(', str_replace(' ', '\ ', $file_info['filename']))));
   }


These are both converts to text for searching purposes from some legacy code. The do meet the criteria on minimum install.

Cheers

P


--


*Piers Rowan
Managing Director
Recruit Online



piers.ro...@recruitonline.com.au <mailto:piers.ro...@recruitonline.com.au>
Mob: 0418 770 331

www.recruitonline.com.au <http://www.recruitonline.com.au>
*

   * *Recruitment, Advertising, Document Managment, CRM, Online Storage
     and Search hosted services.*

--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to