[Rails] How to Parse Microsoft Word Document

2011-03-16 Thread Hafiz Badrie Lubis
Hi People, I just joined the group and I want to ask something about my problem. I'm still learning Ruby on Rails and now I have a task to parse Microsoft Word and store the content into database. Do you have any suggestion how to do it? FYI, I develop it under Unix Environment. So, I don't

Re: [Rails] How to Parse Microsoft Word Document

2011-03-16 Thread Walter Lee Davis
On Mar 16, 2011, at 2:51 PM, Hafiz Badrie Lubis wrote: Hi People, I just joined the group and I want to ask something about my problem. I'm still learning Ruby on Rails and now I have a task to parse Microsoft Word and store the content into database. Do you have any suggestion how to do it?

Re: [Rails] How to Parse Microsoft Word Document

2011-03-16 Thread Scott Ribe
On Mar 16, 2011, at 12:51 PM, Hafiz Badrie Lubis wrote: But all I found that I need to use JRuby and combine it with Apache POI or else I need to use win32ole. You can run poi as a separate process and then grab its output. -- Scott Ribe scott_r...@elevated-dev.com

Re: [Rails] How to Parse Microsoft Word Document

2011-03-16 Thread Vladimir Rybas
1. Convert .doc to .pdf with PyODConverter http://www.artofsolving.com/opensource/pyodconverter 2. Convert .pdf to .tiff with ImageMagick 3. Process .tiff through Tesseract OCR and get .txt On Wed, Mar 16, 2011 at 9:51 PM, Hafiz Badrie Lubis hafiz.b.lu...@gmail.com wrote: Hi People, I just

Re: [Rails] How to Parse Microsoft Word Document

2011-03-16 Thread Scott Ribe
On Mar 16, 2011, at 5:10 PM, Vladimir Rybas wrote: 1. Convert .doc to .pdf with PyODConverter http://www.artofsolving.com/opensource/pyodconverter 2. Convert .pdf to .tiff with ImageMagick 3. Process .tiff through Tesseract OCR and get .txt Wow, talk about a long slow way to potentially