Skip to content. | Skip to navigation

Personal tools

You are here: Home / kulturarvscluster

Cultural Heritage Cluster

INFO: Because of a change of the entire underlying platform The DeIC National Cultural Heritage Cluster is temporarily closed for pilot projects until august 2017.

DeIC (Danish e-Infrastructure Cooperation) has been charged with spreading High-Performance Computing (HPC) to new research areas, such as the humanities and social science areas. In order to respond to this, DeIC and the Royal Danish Library have agreed to establish the DeIC National Cultural Heritage Cluster, Royal Danish Library.

The cultural heritage cluster applies state-of-the-art technologies within data science, and for the first time ever facilitates quantitative research projects on the digital Danish cultural heritage – e.g. radio and TV programmes, websites and historical newspapers.

In recent years, the Royal Danish Library has participated in national and international research and research infrastructure projects based on Danish digital cultural heritage. The library has expanded both knowledge and competences about what it takes to offer, for instance, data mining – the search for structures and patterns in large data sets.

The agreement between DeIC and the Royal Danish Library has a total financial framework of DKK 7.2 million over three years.

Collections available to research projects

The Royal Danish Library is responsible for collecting and preserving large parts of the Danish cultural heritage, including the digital cultural heritage. This digital cultural heritage is divided into numerous collections, each with its own properties, formats and possibilities. Examples of collections that are now made available to researchers include radio/TV, the Netarchive and the Danish Newspaper Collection.

The radio/TV collection contains more than 1 million hours of TV broadcasts and more than 1.5 million hours of radio programmes broadcast on Danish channels from the 1980s until today. The collec-tion's data are made accessible as audio and video files. The collection also contains large amounts of metadata, such as programme titles, broadcast times and subtitles, depending on the epoch from which the material originates. Read more at mediestream.dk.

The Netarchive contains more than 800 TB data, corresponding to more than 25 billion objects gath-ered from the Danish part of the Internet from 2005 until today. This archive also contains both data and metadata, and both are made available to research projects. You can read more at netarkivet.dk.

The digital newspaper collection contains 29 million newspaper pages from the 1700s until today. Once the digitisation project is complete, there will be 32 million pages in the collection. All of these pages are stored as image files along with a large amount of metadata and optical character recognition data (OCR).

In addition to these large collections, the Royal Danish Library also has smaller special collections.

All in all, more than 4 PB, corresponding to approx. 4,000,000 gigabytes, are made available to new and existing research projects.

Platform

The Cultural Heritage Cluster is to support new areas, particularly within digital humanities. It was therefore decided to design a system that would make it easier easy to conduct well-established analyses without having to compromise in relation to advanced and be-spoke methods.

The Cultural Heritage Cluster is making Hortonworks Data Platform available to research projects. This platform is developed within the framework of the Open Data Platform (ODPi) and on top of that there is installed web based tools to easy the access to the cluster.

The Open Data Platform is a new initiative from the largest Hadoop distributors, and it features many of the current Hadoop technologies. You can read about ODPi at odpi.org, and from this site, it is possible to download a virtual and fully functional ODPi server, which can be run on an ordinary desk-top PC so that the techniques can be tested in a small setup.

The web based access tools will include Jupyter Notebooks and RStudio

Pilot projects

Over the coming months, three pilot projects will utilise the system's new facilities. The Royal Danish Library in collaboration with the DeIC eScience center of competence will make facilities available and offer training in use of the system to the researchers working on these projects free of charge. In 2017 and 2018, DeIC and the Royal Danish Library will offer further, fully financed pilot projects through open project invitations.

In the course of 2018, it will also be possible to buy calculation time and consultancy assistance under a transparent price model, which will be developed in connection with the first pilot projects.

The three planned pilot projects are:

Probing a Nation's Web Domain, run by Professor Niels Brügger from Aarhus Univer-sity and Senior Researcher Ditte Laursen from the Royal Danish Library. The project will analyse the Danish part of the Internet as it has developed from 2005 until today. Their data source will primarily be metadata from the Netarchive.

Digital Footprints Research Group, run by Anja Bechmann, Aarhus University. This project will analyse data from social media. The data source will be both the project's own data and data from the Netarchive.

A project run by Sabine Kirchmeier-Andersen from the Danish Language Council's research institute. This project will analyse the development in the Danes' language usage on the social media, and the data source will be the Netarchive.

Further information

Future project invitations will be distributed through national channels for all relevant fields. If you are interested in being notified directly, please contact us. See to the right for contact information.

Contact

COntact

 

TONY BRIAN ALBERS

tba@kb.dk
8946 2100

 

 

ASGER ASKOV BLEKINGE

abr@kb.dk
8946 2100

 

 

KATRINE HOFMANN GASSER

khg@kb.dk
8946 2301

DeIC