Saturday, December 29, 2012

How to Archive a Website

How to download a website?  How to archive a website?  Quick answer, use HTTRack which is free and available for download at the official website:

If you like a more in-depth read of how I found my solution, read below:

I was wondering today if it is possible to archive a website for future, offline viewing.  For example, I worked as an internal auditor for a local school district and would like to preserve the internal audit page that I had setup while working in that role.

As always, I did a Google search to see what I could find.  The first result is from one of my favorite websites,

Somewhat dated and just gives you the basic "Save page as..." functionality found in most web browsers such as Chrome.

If that is all you need, the link to the particular page is here.

This will give you the page and image files, but does not include links such as documents and PDF.  What if you want to download the entire page for offline reading or archiving?  The comments section for the above article was useful, and one comment suggests using HTTrack Website Copier.

The description from the website:

 HTTrack is a free (GPL, libre/free software) and easy-to-use offline browser utility.

It allows you to download a World Wide Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the "mirrored" website in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. HTTrack is fully configurable, and has an integrated help system.

The program is free, which is always a big plus.  Let's see if it works - and it does!  Works like a charm.  See the image below of the download in action.  When finished, HTTrack gives you the option of browsing your downloaded site.

