Tuesday, 9 June 2015

The second meeting of the UK Archivematica group

Unaffected by the threatened rail strike and predicted thundery showers, the 2nd UK Archivematica meeting went ahead at the Tate Britain in London last week.

This second meeting saw an increase in attendees with 22 individuals representing 15 different organisations in the UK and beyond (a representative from the Zuse Institute in Berlin joined us). It is great to see a growing awareness and engagement with Archivematica in the UK. The meeting provided another valuable chance to catch up, compare notes and talk about some of the substantial progress that has been made since our last meeting in January.

Updates were given by Marco Klindt from the Zuse Institute on their plans for an infrastructure for digital preservation as a service and Chris Grygeil from the University of Leeds on their latest thinking on a workflow for digital archiving. Marco plans to use Archivematica as an ingest tool before pushing the data to Fedora for access. The Zuse Institute have been working with Artefactual Systems on sponsoring some useful AIP re-ingest functionality which will allow Archivematica to re-process AIPs at a later date. Chris updated us on ongoing work at Leeds to define their Archivematica workflows. Here Bit Curator is being used before ingest into Archivematica and there is ongoing discussion about how exactly the 2 tools fit together in the workflow. Bit Curator can highlight sensitive information and perform redactions but do you want to do this to original files before ingesting with Archivematica?

Matthew Addis from Arkivum gave a really interesting presentation on some work he has been doing on testing how Archivematica handles scientific datasets, specifically genomics data. He described this as being large in size, unidentified by Archivematica and with no normalisation pathways. This struck a chord with me being that I have spent much of the past few weeks looking at the types of data that researchers produce and finding a long tail of specialist or scientific data formats that are of a similar nature. His testing of the capabilities of Archivematica has produced some useful results, with success at processing a 500GB file in 5 hours.
Donuts at J.Co by Masked Card on Flickr
 CC BY-NC-ND 2.0

Next I gave an update on our Jisc Research Data Spring project “Filling the digital preservation gap”. Apart from going on for too long and keeping people from their coffee and doughnuts, I gave an introduction to our project, focusing on the nature of research data and our findings about file formats in use by researchers at the University of York. See previous blogs (first, second) for more infomation on where we are with this work.

I talked about how file identification was key to digital preservation as demonstrated by the NDSA Levels of Digital Preservation where having an inventory of the file formats in your archive comes in quite early at level 2. If you don’t know what you’ve got it is very difficult to make decisions about how you can manage and preserve that content for the long term. This is the case whether we are talking about a migration or an emulation based strategy to digital preservation.

I went on to discuss briefly the 3 basic workflows for using Archivematica and asked for feedback on these:
  1. Archivematica is the start of the process. Archivematica produces both the Archival Information Package (AIP) and the Dissemination Information Package (DIP) and the DIP is sent to the repository
  2. The repository is the start of the process and the Submission Information Package (SIP) goes from there into Archivematica. There are potential variations in the workflow here depending on whether you want Archivematica or the repository to produce the DIP
  3. Archivematica is utilised as a tool that is called separate to the repository as part of the workflow

Are there any others that I've missed?

I also talked through some of the ideas we have had for enhancements to Archivematica. We are hoping that subsequent phase of this project will enable us to sponsor some development work which will make Archivematica better or more suitable for inclusion within a wider infrastructure for managing research data. I highlighted the development ideas on our current short list and asked attendees to select whether the ideas were 'very useful', 'quite useful' or 'not at all useful' for their own proposed implementations. 

It is really helpful for us to get feedback from other Archivematica users so that we can ensure that what we are proposing will be more widely useful (and that we haven't missed an alternative solution or workaround). Over the next week the project team will be reviewing the development ideas and the feedback received at the UK Archivematica meeting and speaking to Artefactual Systems (who support Archivematica) about our ideas.

The day finished with an introduction to Binder ("an open source digital repository management application designed to meet the needs and complex digital preservation requirements of cultural heritage institutions"). We watched this video as an introduction to the system and then had a conference call with Ben Fino-Radin from the Museum of Modern Art who was able to answer our questions. Binder looks to be an impressive tool for helping to manage digital assets. Building on the basics of Archivematica (which essentially packages things up for preservation), Binder provides an attractive front end enabling curators to more effectively manage and better understand their digital collections.

The next meeting of the UK Archivematica group is planned to be held in Leeds in October/November 2015. It was agreed that we would schedule a longer session in order to allow for more informal discussion and networking alongside the scheduled presentations and progress reports. I'm confident that the group will have lots more Archivematica activity to report on by the time we next meet.