Netarchive is the national Danish web archive. Since July 2005 Netarchive has been collecting, preserving and making available the Danish part of the Internet. Netarchive covers all websites under the top level domain .dk, as well as web content published in Danish, written by or about Danes, or addressed to a Danish audience.
Netarchive is operated as a collaboration between the State and University Library and The Royal Library in Copenhagen. We use an in-house developed web archiving tool called to manage the web crawls. We use the web crawler Heritrix.
netarchive has three collection strategies
- Snapshot harvesting
A snapshot is taken of the Danish part of all websites four times a year.
- Selective harvesting of 80-100 websites
Websites that are updated frequently (such as news sites for instance) are harvested to fill out the gaps between snapshot harvesting.
- Event harvesting
Special collections are made in connection with national events. Sometimes event harvesting is planned in advance (local or parliamentary elections, for instance); but it may also be done after unforeseen events (the finance crisis, swine flu or the Mohammed cartoon crisis, for instance).
Access to netarchive
At the moment the Netarchive is only accessible to researchers who have submitted an application. For more information, please visit the archive's website.
The Netarchive is working to improve its accessibility (either the whole of the archive, or parts of it) for students and – ultimately – the general public. As the archive contains sensitive personal data, the entire archive is classified as a sensitive source. This makes it more difficult to provide broad access to the collection.