Back in May I've got all 26478 questions (from all language sections) backed-up 
with wget, after removing JS (embedded and standalone files) they take up 
~890MiB of disk space.
A few dozen questions with multiple pages of answers have only the first page 
with top rated answers saved.
I removed all JS with xlstproc, further clean-up of superfluous elements can be 
done with it.
Everything is available on https://askfedora.brtk.098.pl/ but since I only 
saved questions, for now you need to go directly to their addresses, e.g. form 
https://askfedora.brtk.098.pl/sitemap.xml (robots are asked not to index it).
Links to related questions point to same server, other (AskBot interface links) 
to https://askbot.fedoraproject.org/

All that was to backup the most important data with minimal effort. To retain 
most usability, either proxy or current Ask Fedora server would have to rewrite 
domain name and redirect all requests starting with language codes (e.g. 
https://ask.fedoraproject.org/en/question/* 
https://ask.fedoraproject.org/es/question/*) to addresses where those html 
files are hosted. That would open askbot questions from this backup without 
interfering with Discourse and breaking old links or search engine results 
(well, most of them).

Without working redirects, our next best (ok, acceptable, not best) plan is 
probably to add some landing page, further clean-up those files (xlstproc seems 
powerful, capable of removing or replacing selected elements) and make sure 
they are indexed by all search engines on target domain (probably 
askbot.fedoraproject.org, after read-only askbot expires).

If we were to continue with what I've already got, I'd probably also save user 
profile pages, while they are still available. How long until AskBot is shut 
down?
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Reply via email to