Dear Mr. Tilman Hausherr, Please kindly accept my deep apology. And I cordially thank your quick and excellent, delightful answer. So far, I analyzed only the link to stackoverflow but will check all the link suggested by you. My major is not related to software but just bio-chemistry and I'm finalizing the development of my application these days. Therefore, I must take care of from A to Z, a millions of matters....I've been really hectic. Please kindly understand. While I didn't fully check all the link from you, but it doesn't make sense I need all the many dll files to only extract text from PDF. (But I'm really satisfied with the quality of PDFBox) Hope you can also develop a 'nitro turbo' button as a library(.dll)
Again, my deepest appreciation to you. All the best ! Truthfully yours, Mr. Su-Sang, Lee (Kay Lee) +82-10-3180-7976 [email protected] > Subject: Re: Hello, I have a question in extracting Texts from PDF file. > To: [email protected] > From: [email protected] > Date: Wed, 18 May 2016 09:11:08 +0200 > > Am 18.05.2016 um 04:21 schrieb Kay_Lee: > > Hello, > > > > I'm living in South Korea in Far-East Asia and I'm usinig Apache PDFBox in > > extracting Texts from PDF files. > > Name: Su-Sang, Lee (English name: Kay Lee) > > Cell Phone: +82-10-3180-7976 > > Residence: Seoul, South Korea, Asia > > E-mail: [email protected] (or [email protected]) > > > > My software development environment is, > > > > Windows10, Visual Studio2015, C#, PDFBox version 1.1.1(Build of Apache > > PDFBOX library for .NET binaries, available as Nuget pacakage.) > > > > I can extract Texts (our Korean language) from PDF file with many thanks to > > Apache Foundation. > > > > However, what I concern most is that PDFBox takes little bit longer time in > > extracting than iTextSharp and other competitors. > > > > What I need is only extracting Korean Text from PDF file and no more > > purposes. > > > > I tried to research on internet like google and stackoverflow but no > > specific solution and limited cases. > > > > 1) How can I extract text faster? > > You can't. Unless you have a "turbo" or "nitro" button on the computer. > > make sure you opening the files as files and not as streams. But I see > below, you already do that, i.e. your code is good. > > > 2) And do I need all the library wtih more than 30 MB files, if I only need > > to extract Texts ? > > Of PDFBox itself, you need pdfbox and fontbox and logging. If files are > encrypted, then also bouncy castle. You won't need xmp and the image > libraries. See also here > https://pdfbox.apache.org/1.8/dependencies.html > > > If I only need some specific dll library files among all PDFBOX dll library > > files, could you please kindly let me know which ones ? > > > > 3) Is it still ok to use PDFBOX 1.1.1 ? There seems recent versions like > > 1.8.12 and 2.0.1. > > indeed. However there is no official .net release, i.e. none of the > "very active developers" is currently using that one (an older release > is here: http://pdfbox.lehmi.de/ ). And I doubt they will be faster. > However they'll extract better. > > There is a guide from 2012 to create the dlls: > https://web.archive.org/web/20120204060917/http://pdfbox.apache.org/userguide/dot_net.html > but I don't know if it works. > > See also this: http://www.squarepdf.net/pdfbox-in-net > https://stackoverflow.com/questions/8441991/how-to-build-pdfbox-for-net > > > > > I don't belong to any company and organization but just a private person > > and developing a software to be distributed and used for free for 5 years > > as public profit purpose. As my major is not software-related but just > > bio-chemistry, please understand kindly and explain me in detail as > > possible as you'd be able. > > If you're non profit and willing to distribute the source code, you can > use iText, see here: http://itextpdf.com/AGPL > > > > > My simple code to extract Text from PDF file is, > > > > internal static string ExtractTextFromPdf(string path) > > { > > PDDocument doc = null; > > try > > { > > doc = PDDocument.load(path); > > PDFTextStripper stripper = new PDFTextStripper(); > > stripper.setSuppressDuplicateOverlappingText(false); > > return stripper.getText(doc); > > } > > finally > > { > > if (doc != null) > > { > > doc.close(); > > } > > } > > } > > Yes that code is fine. > > Tilman > > > > > Hope kind and excellent support. > > > > Thank you so much ! > > > > Mr. Su-Sang, Lee (Kay Lee) > > +82-10-3180-7976 > > [email protected] > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] >

