For PDF related development one valuable resource is 
https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/pdfreference1.7old.pdf

I don’t have any experience in this but investigated many years ago to automate 
digital publication. Maybe it will help. I’m sure Perl can be applied but as 
others suggested, using existing tools is a recommended as a place to start. 
With an open source toolkit and the above reference it is possible to 
understand more clearly how the open source tool was built to support the PDF 
document format.

Sent from my iPhone

> On Jul 29, 2022, at 3:27 PM, William Torrez Corea <willitc9...@gmail.com> 
> wrote:
> 
> 
> 
>> On Fri, Jul 29, 2022 at 11:38 AM hw <h...@adminart.net> wrote:
>> On Sat, 2022-07-23 at 13:03 -0600, William Torrez Corea wrote:
>> > My goal: I want to create something similar to the phone guide. In
>> > this
>> > page exist a great number of documents in format pdf. So, I want to
>> > unite
>> > the different documents and can filter for name, last name, location.
>> > If I
>> > make this manually I have to open each document, download the
>> > document and
>> > search the name of the person manually.
>> > 
>> > The documents uploaded in this page are different: year, date. They
>> > contain
>> > different information.
>> > 
>> 
>> Perhaps you can automate the downloading and then use tools that merge
>> PDF files, like pdfunite, to turn them into a single PDF.  There's also
>> pdf2txt that can extract text from a PDF --- of course, that would only
>> work if there were a way to detect which information is what.
>> 
>> Since we do not have all the PDF files which apparently are all
>> different, we can't tell how it might be possible to detect which
>> information is what.
>> 
>> I wouldn't even bother with this because PDF is awful to get
>> information from automatically.  Whoever makes these PDF files needs to
>> provide the information in such a way that it is usable.  Since you
>> need to download all the files anyway to search for a name, you're
>> better off merging them into a single file and search that in your
>> favourite PDF viewer.
>> 
>> 
>> > 
>> > On Wed, Jul 20, 2022 at 7:04 PM Mike <te...@mflan.com> wrote:
>> > 
>> > > 
>> > > I'm going to be traveling, so will not be able to help much
>> > > in the next 2 days.
>> > > 
>> > > That is a PDF file you supplied.  Is it fair to say you want to
>> > > be able to search for all the names listed in a text file and be
>> > > able to print out which file contains which name.  And in some
>> > > cases the name will not be in any of the files?  Is that the goal?
>> > > 
>> > > Define your goal and we will help you.
>> > > 
>> > > 
>> > > The file below is a bit old, but maybe it works for your
>> > > PDF files.  I have not tested it on your url.  I gather
>> > > you don't have HTML tables, so maybe it is not for your case.
>> > > 
>> > > 
>> > > Mike
>> > > 
>> > > 
>> > > #!/usr/bin/perl -w
>> > > #
>> > > #
>> > > # This program writes the results of the webpage listed in line 17
>> > > # to $outfile.  So basically it converts HTML to text.
>> > > # It works reasonably well with HTML tables.
>> > > #
>> > > #
>> > > 
>> > > #!/usr/bin/perl
>> > > use strict;
>> > > use warnings;
>> > > use LWP::UserAgent;
>> > > use HTML::FormatText::WithLinks::AndTables;
>> > > 
>> > > 
>> > > my $page = 'http://www.mflan.com/crime.htm';
>> > > 
>> > > my $outfile = 'output.txt';
>> > > 
>> > > chdir '/home/mike/Documents/copy';
>> > > 
>> > > open OUT, ">>$outfile" or die "Can't open '$outfile': $!";
>> > > 
>> > > my ($sl, $request, $response, $html);
>> > > 
>> > > $sl = LWP::UserAgent->new;
>> > > 
>> > > 
>> > > $sl->proxy('http', ''); # enter proxy if needs be / and set it for
>> > > Soap
>> > > too ...
>> > > $request = HTTP::Request->new('GET', $page);
>> > > $response = $sl->request($request);
>> > > $html = $response->as_string;
>> > > 
>> > > print "Got it into \$html.\n";
>> > > 
>> > > 
>> > > 
>> > > my $text = HTML::FormatText::WithLinks::AndTables->convert($html);
>> > > 
>> > > 
>> > > print OUT "$text";
>> > > 
>> > > print "\nAll done.\n";
>> > > 
>> > > close OUT;
>> > > 
>> > > 
>> > > __END__
>> > > 
>> > > 
>> > > 
>> > > 
>> > > On 7/20/22 10:13, William Torrez Corea wrote:
>> > > > The url of the page:
>> > > > 
>> > > > https://www.pgr.gob.ni/PDF/2021/GACETA/GACETA_17_08_2021.pdf
>> > > > 
>> > > > On 7/20/22, William Torrez Corea <willitc9...@gmail.com> wrote:
>> > > > > Exist a page where you put info about the person but if you
>> > > > > want to
>> > > search
>> > > > > a name you must search this manually. So, I want to automate
>> > > > > this
>> > > process
>> > > > > with perl.
>> > > > > --
>> > > > > 
>> > > > > With kindest regards, William.
>> > > > > 
>> > > > > ⢀⣴⠾⠻⢶⣦⠀
>> > > > > ⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
>> > > > > ⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
>> > > > > ⠈⠳⣄⠀⠀⠀⠀
>> > > > > 
>> > > > 
>> > > 
>> > > 
>> > 
>> 
>> 
>> -- 
>> To unsubscribe, e-mail: beginners-unsubscr...@perl.org
>> For additional commands, e-mail: beginners-h...@perl.org
>> http://learn.perl.org/
>> 
>> 
> 
> I want to create this by means of code, i don't want to use any tool. 
> -- 
> With kindest regards, William.
> 
> ⢀⣴⠾⠻⢶⣦⠀ 
> ⣾⠁⢠⠒⠀⣿⡁ Debian - The universal operating system
> ⢿⡄⠘⠷⠚⠋⠀ https://www.debian.org
> ⠈⠳⣄⠀⠀⠀⠀ 

Reply via email to