MB, having the text would be way more useful than the PDF pages! Thanks for
recommending pdftotext and the -layout option.
I have some questions -- could you help me break this process down into
smaller steps?
I looked up pdfjam's split command online -- I think that it may be a little
time consuming (my PDFs are a few thousand pages long):
http://0x2a.at/blog/2011/02/pdf_manipulation_on_the_cli/
http://tex.stackexchange.com/questions/79623/quickly-extracting-individual-pages-from-a-document
I looked at PDF Shuffler (the GUI one) and that can only split files
one-by-one. Are there other options?
Once I split the files into single pages, I'll need the Shell command 'for
file in pages/*" loop. I don't understand what this step will do. Could you
please explain this step too?
About this step: 'if pdftotex "$file" - | grep -i regexps' -- does this copy
all the PDF text to one text file? And then search (grep) the text file?
Does this command take text from many single PDfs? Or only after the "hit"
pages are joined up into one document?
What does it mean to "append the file to a Shell variable" ? What is the
goal in this step? Could you please explain how I can do this step too?