Preserv2 - File Format Registry (beta)

Service is Currently Under Maintenance To Support Quads and Provenance


The Preserv2 file format registry is designed to be a semantically enhanced registry containing information to aid in the process of digital preservation. The data contained within this registry is pooled together from many other sources including data from The National Archives (UK) Pronom registry as well as dbpedia (semantically enriched wikipedia).

The Preserv2 registry provides open access to all the data contained within as well as services including a SPARQL endpoint and RESTful HTTP services. Data is currently available in XML and RDF formats, an HTML interface is not currently proposed other than to offer an explanation of the services available.

Current number of facts in datastore:

File Format Identifiers

Although the data in this system is gathered from many services the id's for the file format identifiers match those of The National Archives (UK) Pronom service. More documentation on these can be found on their website.

SPARQL Endpoint

SPARQL Endpoint - Provides a full query interface to the underlying data.

The SPARQL interface is backed by a tripple store running the 3store software (v.3.0.17), for a guide to the syntax of the SPARQL query language please refer to the w3c schools guide.

RESTful Data Browser

This service does not expose backward links like the Linked Data Browser (see last section) however it does allow browsing of any data in html, xml and rdf formats.

The RESTful services operate as XPath sytle URIs where you can descend beyond the identifier URL's. The following steps give a quick demonstation:

  1. Fetch all data on a specific file format:
  2. Browse to /pronom/fmt/11 (RDF Version) or /pronom/fmt/11.xml for the xml version and /pronom/fmt/11.htmlfor the linked html version

  3. Request only the FileFormatIdentifiers
  4. Browse to /pronom/fmt/11/FileFormatIdentifier. Again add the xml or html extensions for those versions.

Risk Analysis Service

This service pools together selected information (according to a profile) to expose to the requesting user. Currently there is only data in the system for one format supertype (PDF) and the data is accessed with the same pronom_ids as used above.

The following are all valid ways to access the risk analysis profile for PDF 1.4. Again HTML and RDF formats are all available but currently not XML (pending a schema).

Risk Summary Service

This service takes the data from the risk analysis service and applies some logic and processing upon it to eventually calculate a risk score. This service is refers mostly to a set of default risk levels which have been loaded into the registy however in places extra processing is done to combine data to analyse risk.

  • Risk Analysis for PDF (v1.4)
  • Note: I would recommend sticking to the PDF identifiers as not that much data has been loaded regarding other formats as yet.

    Migration Pathways

    An easy way to find out how to get from one format to another. This service returns a list of softwares which can open your input format and save to a desired output format.

    This service is now online and because it is a dynamic query service it works differently from the rest of the services (currently), to be honest i'm still arguing with myself about how it should work, at the same time this might affect how the risk analysis service works. The manor which it is set up is the most dynamic to handle any number of expanding file format ids.

    It works by taking a from format uri and a to format uri along with the type of results you want (html or rdf) and returns a list of softwares and the process those software can perform to move from your input format to the requested output format. Currently there can be up to 2 translations allowed between input and output format. Example:

    A lot of the data needed to make this service work is already in the registry, try combining the RESTful browser with the SPARQL endpoint and find out results for yourself.

    Example Softwares SPARQL Query
    prefix pronom: <>
    prefix rdf: <>
    select distinct ?name ?y where 
    ?x ?y <> . 
    ?y rdf:type <> . 
    ?x pronom:SoftwareName ?name

    Linked Data Browser

    Linked Data Browser - Keyword search and RDF browsing interface

    This simple interface allows browsing of the data on a resource and predicate basis. Once a resource is located you can then get data about this resource and browse the links it shares with other data. A full worked example is listed here.