Digital Archiving at the University of York: April 2015

Tuesday 28 April 2015

IT's personal: some thoughts from the journey home

Today I went to London to attend a Digital Preservation Coalition event on Personal Digital Archives. I like going to London and I like going to these sorts of workshops. I also like the time for reflection sat in the quiet carriage of a Virgin train on the way home. There is something about being cut off from the internet and away from the everyday distractions of the office which helps focus the mind.

Today was interesting because I expect like many of the attendees I was there with two hats on – being able to benefit from the day both as a digital archivist and as an individual with my own personal digital archive to maintain.

What follows is not so much a summing up of the day, but just a quick mention of some of the thoughts I’m taking away. There were some interesting presentations that I haven’t mentioned (apologies).

Gabriella Redwine from the Beinecke Library at the University of Yale gave a great introduction to the topic of personal digital archives, and defining them as the things created by or about an individual, a rather formal term for the digital stuff we all create over the course of our lives. We all have them. They are fragile, regularly neglected and at risk of loss. People tend to manage them when faced with a crisis (eg: computer virus), problem (eg: running out of storage space) or life changing event (eg: moving house or job). We as digital archivists need to be able to advise individuals on how to manage their own digital archives in the hope that the material will survive long enough to be deposited within an archive in the future if appropriate.

Amber Cushing from University College Dublin gave a really interesting talk on how people assign value to their digital files. Both her and Gabriella made the point (that I had only been partially aware of) that people tend to place less value on digital than physical things, that the born-digital is seen as less important than something you can more easily see or hold. I appear to be guilty of this myself I realise. Every year I take hundreds of digital photographs which I store on my computer. These are of high value to me. They provide a record of my life and my family and I want to keep them so that I and subsequent generations can look back on them. Despite the high value I place on them I don’t back them up as often as I should and have even been known to lose some (see previous confession).

At the end of each year I create a photo book for that year. A printed, glossy, hard back album of selected photos from that year, with a title page and captions (documentation and metadata!). I love to receive the finished photo book through the post and place even more value on this physical object than I did of the original photos. This is clear by the fact that I hover around the kids as they look at it, checking that they don’t have grubby hands and worrying that they might inadvertently rip a page whilst turning it.

Do I have the same level of worry when they access my digital originals? No, I happily let them click through them on the computer, never checking whether they had accidentally edited or deleted one or moved an image out of its context from one folder to another. These are eventualities which are probably just as likely (but harder to spot and thus rectify) than damage to the physical book*.

Is this slightly skewed notion of value a result of the extra time and effort I have put into arranging the photographs into a physical book, the expense of having had to pay for it to be printed, or simply down to the fact that it is shiny and I can hold it?

Anyway, this is a slight tangent. It was really interesting to hear about Amber’s research on possession and self extension in relation to personal digital archives and how we as individuals may or may not assign value to the digital stuff that we create.

I was also really pleased to hear James Baxter from the British Library talk about a practical way they had set up workflows for dealing with personal digital archives that have been put in their care. Shutting himself and colleagues in a room for 3 days with some media, and some tools in order to brainstorm workflows and make progress with trying to access, identify and preserve some of this born digital material seemed like a great approach and there were some useful lessons learned from the process. I liked the ‘learning by doing’ approach that he advocated. I tend to agree that the best way to find out if something is going to work is to roll up your sleeves and have a go.

Another repeated message of the day was about language and how we can communicate and bring people along with us. Mike Ashenfelder from the Library of Congress mentioned that though libraries may run personal digital archiving courses for the public, it is hard to compete with other courses and learning opportunities with more appealing names. Amber mentioned that when interviewing people for her research, she avoided use of the term 'archiving' instead asking them about how they ‘maintained’ their digital files.

Having over the last week taught two sessions at the University of York on ‘Research Data Management’ I can relate to this problem. Getting people to come along and engage with a topic that has quite a dry title can certainly be a challenge. Perhaps as Mike suggested “looking after your digital stuff” would make it clearer what we were talking about and its immediate relevance to all of us!

My train journey is nearly over so I’ll leave it there, having over the course of this journey created yet another thing to add to my own digital legacy.

I’m looking forward to reading the new DPC technology watch report on the subject of personal digital archiving in the near future.

* yes, I know I could do this with checksums but I do not create checksums for my personal digital files...my life is busy!

Jenny Mitcham, Digital Archivist

Friday 17 April 2015

Jisc Archivematica project update ...because digital preservation won’t just go away

Last month I was excited to discover that Jisc had agreed to fund a joint project between the Universities of York and Hull as part of their Research Data Spring initiative. The aim of this project (as mentioned in my previous blog) is to investigate the potential of Archivematica for Research Data Management. There is a brief summary including my pitch here. We have had a number of other higher education institutions in the UK express an interest in this project and it is fabulous to see that there are others who recognise that the tools that digital archivists use could have much to offer those who are charged with managing research data. Of course we hope this project will also be of interest to a more diverse and international audience and we would like to benefit from the experience and knowledge that already exists within the wider digital preservation community.

We are three weeks in to this project now and here is the first of a series of updates on progress.

One of the initial tasks for teams at both York and Hull was to ensure we had a current version of Archivematica installed. Over the next few weeks there will be a fair amount of testing going on to give us a greater understanding of Archivematica's strengths and weaknesses particularly with regard to how it may handle research data.

A pod on the lake - a great venue for our kick off meeting
(though in reality the weather wasn't this nice)

We also got together for a kick off meeting in one of the pods on the lake on York's Heslington East campus. We defined our work packages and established responsibilities and deadlines and now have a clear idea of what we are focusing on in this initial 3 month phase that Jisc have agreed to fund.

Much of the research we will be carrying out and reporting on at the end of this phase of the project will be based around the following questions:

Why? Why are we bothering to 'preserve' research data. What are the drivers here and what are the risks if we don't?
What? What are the characteristics of research data and how might it differ from other born digital data that memory institutions are establishing digital archives to manage and preserve? What types of files are our researchers producing and how would Archivematica handle these? What does Archivematica offer us and what benefits does it bring?
How? How would we incorporate Archivematica into a wider technical infrastructure for research data management and what workflows would we put in place? Where would it sit and what other systems would it need to talk to?
Who? Who else is using Archivematica (or other digital preservation systems) to do similar things and what can we learn from them?

Working in the pod - nice to have ducks for neighbours

I've started off by giving some thought to the What?

A key part of this project is to look at what a digital preservation system such as Archivematica can offer us as part of an RDM infrastructure. In order to answer this we need to understand a bit about the nature of the research data that we will be asking it to handle. We have been collating existing sources of information about the types of software and data that researchers at the University of York are using, in order to get a clearer picture of what research data is likely to look like. Following on from this we can then start to look at how Archivematica would handle this data.

A couple of years ago at York we carried out some work looking at current data management practice by researchers. We interviewed chairs of research committee for each academic department to get an overview of data management within the department and also put out an online questionnaire to capture a wider and more detailed set of information from individual researchers across the University. This has given us an overview of the types of data that researchers collect or create. This data doesn't go down to the level of specific file types and versions but does talk about broad categories of data (for example which departments are working with databases, audio, video, images etc).

A subsequent survey carried out at York looked more specifically at software packages used by researchers and is a gold mine of information for our project, giving us a list of software packages that we can investigate in more detail. The 'top 20' software packages highlighted by this survey largely consists of software I have never used and never tried to preserve the outputs of - packages such as MATLAB, SPSS, NVivo, Gaussian and ChemDraw. We are investigating how existing digital preservation tools would handle these types of data and looking initially at whether their native file formats appear in Pronom. We are talking to the helpful team at the National Archives about creating new file signatures for formats that are not currently represented. Knowing what you've got is one of the key challenges in digital preservation and if the file identification tools out there can automatically recognise a wider range of formats then this is clearly going to be a step in the right direction for not just this project but the digital preservation community.

Work will continue - watch this space for updates ...and in the meantime, we'd love to hear from you if you are using Archivematica (or another digital preservation system) for research data so we can find out about your workflows.

Jenny Mitcham, Digital Archivist

Digital Archiving at the University of York

Tuesday 28 April 2015

IT's personal: some thoughts from the journey home

Friday 17 April 2015

Jisc Archivematica project update ...because digital preservation won’t just go away

The sustainability of a digital preservation blog...

Twitter

Subscribe