The information age has made it trivial for anyone to create and then share vast amounts of digital data. This includes unstructured collections made of data such as images, video, and audio to collections of born digital content made up of data such as documents and spreadsheets. While the creation and sharing of content has been made easy, its inverse, the ability to search and use the contents of digital data, has been made exponentially more difficult. In the physical analogue librarians have used the process of curation to standardize the format by which information is stored and diligently index holdings with metadata to allow both current and future generations to find information. Digitally this does not happen as that curation overhead is an unwelcomed bottleneck to the creation of more data. Though popular services such as modern search engines give the illusion that this is being done, this is largely over the portion of digital data that is text based and/or containing text metadata. Unstructured collections and contents trapped behind difficult to read file formats, however, make up a significant part of our collective digital data assets and are largely not accessible.
Science today not only uses but relies on software and digital content. It is well known that science is not only responsible for a significant amount of our digital data holdings but that also much of this is un-curated data, what the scientific community currently refer to as "long-tail" data. As such contemporary science, which relies on digital data and software, software which evolves and disappears quickly as underlying technology changes, is entering a realm where scientific results are no-longer easily reproducible and as such in essence no longer a science as science hinges on the fact that a documented procedure will result in the same result each time.