"for file in pages/*" is a for loop. That means that it will execute the body
of the loop for each file in the directory pages/*, setting the variable file
to the filename each time.
'if pdftotext "$file" - | grep -i regexps': the 'pdftotext "$file" -' part
outputs the text of the pdf to standard output. However, this is piped to
grep. When you pipe it, the standard output of the first command becomes
standard input of the second command. So the file will be searched for your
regexp, and the "if" will check if there were any matches.
You can append the filename to a variable using something like '$foo="$foo
$file"'