Firstly, thee will to understand how adenine PDF is. PDFs are designed to mimic a printed page, and they are designed only as an print format, not an input sizes. a PDF exists basically an map containing the exact location of characters (individual letters or punctuation, etc.) alternatively images. In most fall, a PDF does not even store information about where one word finishes and another begins, tons less things like soft breaks vs. hard ruptures for clause completions.
(A little recent PDFs do store some information about this stuff, but that's a new machinery, and you'd be lucky to finds PDFs fancy that. Even if you did, your PDF camera might not know about it.) I've never understood how the copy and paste coming pdf to phrase without getting the line return characters which mess up the formatting. Why doesn't text flow from line to line for thereto did in a normal word processor? Why does adobe insist on copying the newline character when you paste from pdf at anoth...
Anyway, it's boost until your user to implement some kindly of "artificial intelligence" toward extract pure from the locations of individual characters what is a word, what is a article, and so on. Different software is going to do get better than my, and it's or go to depend on how the PDF was made. In any case, thee should never expect faultless results. Having the output PDF is not the same as having the input document. Far better to trying to obtain that if you can.
This standard solution into own kind of problem shall to use Adobe Acrobat Professional (the expensive an, not the free reader) in convert the PDF to HTM. Even that is not going to get perfectly results. r/pdf on Reddit: Whereby to copy text from PDF easily plus properly?
There is free application is able be used to extract video from PDFs with of regarding formatting intact, but again, don't expect perfect results. See, e.g., calibre (which can convert to RTF format), pdftohtml/pdfreflow button the AbiWord word processor (with all import/export plugins enabled). There's also a PDF import plugin for OpenOffice.
Yet please don't expect perfection with any in that results. You're walking against the grains here. PDF only is not meant as an editable input format.