** Description changed:

  This yields no output:
  
  curl -s 'https://www.veridiancu.org' | sed -ne '/<form/,/<\/form/p' |
  urlscan -n
  
  Without the sed filter, urlscan works.  But then urlscan dumps all URLs
  in the whole document.  It seems urlscan was only designed to work on
  whole documents.  So perhaps this is not a "bug" but rather a feature
  request.
  
  The workaround would normally be to use urlview instead, but urlview has
  the limitation of only working interactively.  Perhaps the fix here is
  for urlscan to add a --fuzzyhtml option, and use the guts of urlview to
  do the processing.
+ 
+ (edit)
+ 
+ This workaround works for urlscan:
+ 
+ curl -s 'https://www.veridiancu.org' | python -c 'from bs4 import
+ BeautifulSoup; import sys; print(BeautifulSoup(sys.stdin.read()).form)'
+ | urlscan -n
+ 
+ which might give a clue about what the problem is.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1930437

Title:
  urlscan does not work on HTML fragments

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/urlscan/+bug/1930437/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to