Hi all,
Here is my problem: I have extracted plain texts from a serious of doc(x) documents and their titles via the "dc:title" label of metadata, but I'm not sure this is the right way to attain a title of a document. In many cases, a title inside a document could be of the largest font-size and bold-style, which I want to utilized to extract the very title, however, I have no idea how to get a formatted content and font-size/bold-style detection. please let me know if I miss something. Thank you very much!