Use 'pdftohtml - xml' to convert the pdf in an xml-file and use per
line in de xml-file regulair expressions to extract the data.
[pdftohtml]
https://www.sourceforge.net/projects/pdftohtml/
Op Fri, 9 Mar 2018 00:00:39 -0800
Shazia Nusrat schreef:
> Hi,
>
> I am trying to work around with PDF's
PDF processing is very difficult, because the entire standard is a dumpster
fire. For example, it has no concept of structure like headings,
paragraphs or sentences because each and every character is just a
character, location coordinate, font size and font type.
In order to process the docum
Good luck.
Best case scenario in my opinion is using the utility pdf2text and regex,
and this will be painful.
On Fri, Mar 9, 2018, 3:01 AM Shazia Nusrat wrote:
> Hi,
>
> I am trying to work around with PDF's where user uploads PDF in image or
> filefield and then way to extract it for Django
Hi,
I am trying to work around with PDF's where user uploads PDF in image or
filefield and then way to extract it for Django and finally update DB table
based on it. Following are the models:
class StudentFee(models.Model):
class_name = models.CharField(choices=CLASSES, max_lenght=200)
4 matches
Mail list logo