[Touch-packages] [Bug 1937874] [NEW] one --accept-regex expression negates another

Bill Yikes Fri, 23 Jul 2021 12:50:46 -0700

Public bug reported:

This command should theoretically fetch all PDFs on a page:


$ wget -v -d -r --level 1 --adjust-extension --no-clobber --no-directories\
       --accept-regex 'administrative-orders/.*/administrative-order-matter-'\
       --accept-regex 'administrative-orders.*.pdf'\
       --accept-regex 'administrative-orders.page[^&]*$'\
       --directory-prefix=/tmp\
       
'https://www.ncua.gov/regulation-supervision/enforcement-actions/administrative-orders?page=56'

But it fails to grab any of them, giving the output:

---
Deciding whether to enqueue 
"https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf";.
https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf is 
excluded/not-included through regex.
Decided NOT to load it.
---

That's bogus.  The workaround is to remove this option:

--accept-regex 'administrative-orders.page[^&]*$'

But that should not be necessary.  Adding an --accept-* clause should
never cause another --accept-* clause to become invalidated and it
should not shrink the set of fetched files.

** Affects: wget (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to wget in Ubuntu.
https://bugs.launchpad.net/bugs/1937874

Title:
  one --accept-regex expression negates another

Status in wget package in Ubuntu:
  New

Bug description:
  This command should theoretically fetch all PDFs on a page:

  $ wget -v -d -r --level 1 --adjust-extension --no-clobber --no-directories\
         --accept-regex 'administrative-orders/.*/administrative-order-matter-'\
         --accept-regex 'administrative-orders.*.pdf'\
         --accept-regex 'administrative-orders.page[^&]*$'\
         --directory-prefix=/tmp\
         
'https://www.ncua.gov/regulation-supervision/enforcement-actions/administrative-orders?page=56'

  But it fails to grab any of them, giving the output:

  ---
  Deciding whether to enqueue 
"https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf";.
  https://www.ncua.gov/files/administrative-orders/AO14-0241-R4.pdf is 
excluded/not-included through regex.
  Decided NOT to load it.
  ---

  That's bogus.  The workaround is to remove this option:

  --accept-regex 'administrative-orders.page[^&]*$'

  But that should not be necessary.  Adding an --accept-* clause should
  never cause another --accept-* clause to become invalidated and it
  should not shrink the set of fetched files.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1937874/+subscriptions


-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

[Touch-packages] [Bug 1937874] [NEW] one --accept-regex expression negates another

Reply via email to