Data sharing solutions for publicly funded research

The Data-Sharing Archive at shared-data.com (for providers) and shared-data.org (for free data use) is a new service being developed by Textensor Limited for cataloguing, storing, sharing and disseminating publicly funded research data. It is designed to provide an efficient and cost-effective solution to the need to make such data easily accessible.

The currently proposed features are presented below. This is still at the service definition stage, so if you are interested in data sharing solutions, please get in touch.

Motivation

Making publicly funded research data readily available is widely agreed to be desirable in many cases, and is even mandated by some funding agencies. However, for many types of data there are currently very few options available to the investigator who wishes to share data.

There are already numerous solutions used for commercial data handling that would be functionally adequate, but in general they are prohibitively expensive for research data.

The focus here, therefore, is on economising on those features that are not important in a research context, and maximising those that are. In particular, the diversity of research data to be shared calls for flexible, user-friendly tools to facilitate the creation and presentation of metadata for every data set that is archived. And metadata standards are required to facilitate harvesting and aggregation by other data sharing services so that resources can be easily located.

However, economies can be made on the data storage itself. Maintaining high availability, as in a conventional data center where any item of data can be accessed in a fraction of a second, is not necessary for data sharing. It is probably also not worth the environmental cost: for example, even without backup systems, maintaining 250GB data on a spinning disk consumes about 10 Watts, which is equivalent to an energy requirement of 87kWh per year (a spun-down disk still takes about 40kWh/year). This is no doubt justifiable in some cases, but in other cases such data may be required only a few times a year. It is important therefore that long-term storage hardware is capable of reducing its power requirements to a minimum, and that the storage software can manage demand and relocate data according to its frequency of access.

Service specification

The proposed service comprises five main components:

The period for which the data are kept immediately available on-line after the end of the research project can be specified by the investigator but is typically ten years. At any time the full data sets and meta-data can be extracted and re-hosted by third parties such as institutional, national or subject-specific archives. Indeed, Textensor will take action to encourage and facilitate this process by publishing all meta-data formats and access statistics. In this respect, the service helps streamline and facilitate the use of emerging permanent archives.