Re: How to populate DB from PDF extracted data

2018-03-09 Thread Jaap van Wingerde
Use 'pdftohtml - xml' to convert the pdf in an xml-file and use per line in de xml-file regulair expressions to extract the data. [pdftohtml] https://www.sourceforge.net/projects/pdftohtml/ Op Fri, 9 Mar 2018 00:00:39 -0800 Shazia Nusrat schreef: > Hi, > > I am trying to work around with PDF's

Re: How to populate DB from PDF extracted data

2018-03-09 Thread Jason
PDF processing is very difficult, because the entire standard is a dumpster fire. For example, it has no concept of structure like headings, paragraphs or sentences because each and every character is just a character, location coordinate, font size and font type. In order to process the docum

Re: How to populate DB from PDF extracted data

2018-03-09 Thread m1chael
Good luck. Best case scenario in my opinion is using the utility pdf2text and regex, and this will be painful. On Fri, Mar 9, 2018, 3:01 AM Shazia Nusrat wrote: > Hi, > > I am trying to work around with PDF's where user uploads PDF in image or > filefield and then way to extract it for Django

How to populate DB from PDF extracted data

2018-03-09 Thread Shazia Nusrat
Hi, I am trying to work around with PDF's where user uploads PDF in image or filefield and then way to extract it for Django and finally update DB table based on it. Following are the models: class StudentFee(models.Model): class_name = models.CharField(choices=CLASSES, max_lenght=200)