Pages

Monday, 13 February 2017

What have we got in our digital archive?

Do other digital archivists find that the work of a digital archivist rarely involves doing hands on stuff with digital archives? When you have to think about establishing your infrastructure, writing policies and plans and attending meetings it leaves little time for activities at the coal face. This makes it all the more satisfying when we do actually get the opportunity to work with our digital holdings.

In the past I've called for more open sharing of profiles of digital archive collections but I am aware that I had not yet done this for the contents of our born digital collections here at the Borthwick Institute for Archives. So here I try to redress that gap.

I ran DROID (v 6.1.5, signature file v 88, container signature 20160927) over the deposited files in our digital archive and have spent a couple of days crunching the results. Note that this just covers the original files as they have been given to us. It does not include administrative files that I have added, or dissemination or preservation versions of files that have subsequently been created.

I was keen to see:
  • How many files could be automatically identified by DROID
  • What the current distribution of file formats looks like
  • Which collections contain the most unidentified files
...and also use these results to:
  • Inform future preservation planning and priorities
  • Feed further information to the PRONOM team at The National Archives
  • Get us to Level 2 of the NDSA Levels of Digital Preservation which asks for "an inventory of file formats in use" and which until now I haven't been collating!

Digital data has been deposited with us since before I started at the Borthwick in 2012 and continues to be deposited with us today. We do not have huge quantities of digital archives here as yet (about 100GB) and digital deposits are still the exception rather than the norm. We will be looking to chase digital archives more proactively once we have a Archivematica in place and appropriate workflows established.

Last modified dates (as recorded by DROID) appear to range from 1984 to 2017 with a peak at 2008. This distribution is illustrated below. Note however, that this data is not always to be trusted (that could be another whole blog post in itself...). One thing that it is fair to say though is that the archive stretches back right to the early days of personal computers and up to the present day.

Last modified dates on files in the Borthwick digital archive

Here are some of the findings of this profiling exercise:

Summary statistics

  • Droid reported that 10005 individual files were present
  • 9431 (94%) of the files were given a file format identification by Droid. This is a really good result ...or at least it seems it in comparison to my previous data profiling efforts which have focused on research data. This result is also comparable with those found within other digital archives, for example 90% at Bentley Historical Library, 96% at Norfolk Record Office and 98% at Hull University Archives
  • 9326 (99%) of those files that were identified were given just one possible identification. 1 file was given 2 different identifications (an xlsx file) and 104 files (with a .DOC extension) were given 8 identifications. In all these cases of multiple identifications, identification was done by file extension rather than signature - which perhaps explains the uncertainty

Files that were identified

  • Of the 9431 files that were identified:
    • 6441 (68%) were identified by signature (which suggests a fairly accurate identification - if a file is identified by signature it means that Droid has looked inside the file and seen something that it recognises. Last year I was inducted into the magic ways this happens - see My First File Format Signature!)
    • 2546 (27%) were identified by container (which again suggests a high level of accuracy). The vast majority of these were Microsoft Office files 
    • 444 (5%) were identified by extension alone (which implies a less accurate identification)


  • Only 86 (1%) of the identified files had a file extension mismatch - this means that the file extension was not what you would expect given the identification by signature. There are all sorts of different examples here including:
    • files with a tmp or dot extension which are identified as Microsoft Word
    • files with a doc extension which are identified as Rich Text Format
    • files with an hmt extension identifying as JPEG files
    • and as in my previous research data example, a bunch of Extensible Markup Language files which had extensions other than XML
So perhaps these are things I'll look into in a bit more detail if I have time in the future.

  • 90 different file formats were identified within this collection of data

  • Of the identified files 1764 (19%) were identified as Microsoft Word Document 97-2003. This was followed very closely by JPEG File Interchange Format version 1.01 with 1675 (18%) occurrences. The top 10 identified files are illustrated below:

  • This top 10 is in many ways comparable to other similar profiles that have been published recently from Bentley Historical Library, Hull University Archive and Norfolk Records Office with high occurrences of Microsoft Word, PDF and JPEG images. In contrast. what it is not so common in this profile are HTML files and GIF image files - these only just make it into the top 50. 

  • Also notable in our top ten are the Sibelius files which haven't appeared in other recently published profiles. Sibelius is musical notation software and these files appear frequently in one of our archives.


Files that weren't identified

  • Of the 574 files that weren't identified by DROID, 125 different file extensions were represented. For most of these there was just a single example of each.

  • 160 (28%) of the unidentified files had no file extension at all. Perhaps not surprisingly it is the earlier files in our born digital collection (files from the mid 80's), that are most likely to fall into this category. These were created at a time when operating systems seemed to be a little less rigorous about enforcing the use of file extensions! Approximately 80 of these files are believed to be WordStar 4.0 (PUID:  x-fmt/260) which DROID would only be able to recognise by file extension. Of course if no extension is included. DROID has little chance of being able to identify them!

  • The most common file extensions of those files that weren't identified are visible in the graph below. I need to do some more investigation into these but most come from 2 of our archives that relate to electronic music composition:


I'm really pleased to see that the vast majority of the files that we hold can be identified using current tools. This is a much better result than for our research data. Obviously there is still room for improvement so I hope to find some time to do further investigations and provide information to help extend PRONOM.

Other follow on work involves looking at system files that have been highlighted in this exercise. See for example the AppleDouble Resource Fork files that appear in the top ten identified formats. Also appearing quite high up (at number 12) were Thumbs.db files but perhaps that is the topic of another blog post. In the meantime I'd be really interested to hear from anyone who thinks that system files such as these should be retained.


Friday, 10 February 2017

Harvesting EAD from AtoM: a collaborative approach

In a previous blog post AtoM harvesting (part 1) - it works! I described how archival descriptions within AtoM are being harvested as Dublin Core for inclusion within our University Library Catalogue.* I also hinted that this wouldn’t be the last you would hear from me on AtoM harvesting and that plans were afoot to enable much richer metadata in EAD 2002 XML (Encoded Archival Description) format to be harvested via OAI-PMH.

I’m pleased to be able to report that this work is now underway.

The University of York along with five other organisations in the UK have clubbed together to sponsor Artefactual Systems to carry out the necessary development work to make EAD harvesting possible. This work is scheduled for release in AtoM version 2.4 (due out in the Spring).

The work is being jointly sponsored by:



We are also receiving much needed support in this project from The Archives Hub who are providing advice on the AtoM EAD and will be helping us test the EAD harvesting when it is ready. While the sponsoring institutions are all producers of AtoM EAD, The Archives Hub is a consumer of that EAD. We are keen to ensure that the archival descriptions that we enter into AtoM can move smoothly to The Archives Hub (and potentially to other data aggregators in the future), allowing the richness of our collections to be signposted as widely as possible.

Adding this harvesting functionality to AtoM will enable The Archives Hub to gather data direct from us on a regular schedule or as and when updates occur, ensuring that:


  • Our data within the Archives Hub doesn’t stagnate
  • We manage our own master copy of the data and only need to edit this in one place
  • A minimum of human interaction is needed to incorporate our data into the Hub
  • It is easier for researchers to find information about the archives that we hold without having to search all of our individual catalogues


So, what are we doing at the moment?


  • Developers at Artefactual Systems are beavering away working on the initial development and getting the test site ready for us to play with.
  • The sponsoring institutions have been getting samples of their own AtoM data ready for loading up into the test deployment. It is always better when testing something to have some of your own data to mess around with.
  • The Borthwick have been having discussions with The Archives Hub for some time about AtoM EAD (from version 2.2) but we’ve picked up these discussions again and other institutions have joined in by supplying their own EAD samples. This allows staff at the Hub to see how EAD has changed in version 2.3 of AtoM (it hasn’t very much) and also to see how consistent the EAD from AtoM is from different institutions. We have been having some pretty detailed discussions about how we can make the EAD better, cleaner, fuller - either by data entry at the institutions, automated data cleaning at The Hub prior to display online or by further developments in AtoM.


What we are doing at the moment is good and a huge step in the right direction, but perhaps not perfect. As we work together on this project we are coming across areas where future work would be beneficial in order to improve the quality of the EAD that AtoM produces or to expand the scope of what can be harvested from AtoM. I hope to report on this in more detail at the end of the project, but in the meantime, do get in touch if you are interested in finding out more.







* It is great to see that this is working well and our Library Catalogue is now appearing in the referrer reports for the Borthwick Catalogue on Google Analytics. People are clearly following these new signposts to our archives!

Tuesday, 24 January 2017

Creating an annual accessions report using AtoM

So, it is that time of year where we need to complete our annual report on accessions for the National Archives. Along with lots of other archives across the UK we send The National Archives summary information about all the accessions we have received over the course of the previous year. This information is collated and provided online on the Accessions to Repositories website for all to see.

The creation of this report has always been a bit time consuming for our archivists, involving a lot of manual steps and some re-typing but since we have started using AtoM as our Archival Management System the process has become much more straightforward.

As I've reported in a previous blog post, AtoM does not do all that we want to do in the way of reporting via it's front end.

However, AtoM has an underlying MySQL database and there is nothing to stop you bypassing the interface, looking at the data behind the scenes and pulling out all the information you need.

One of the things we got set up fairly early in our AtoM implementation project was a free MySQL client called Squirrel. Using Squirrel or another similar tool, you can view the database that stores all your AtoM data, browse the data and run queries to pull out the information you need. It is also possible to update the data using these SQL clients (very handy if you need to make any global changes to your data). All you need initially is a basic knowledge of SQL and you can start pulling some interesting reports from AtoM.

The downside of playing with the AtoM database is of course that it isn't nearly as user friendly as the front end.

It is always a bit of an adventure navigating the database structure and trying to work out how the tables are linked. Even with the help of an Entity Relationship Diagram from Artefactual creating more complex queries is ...well ....complex!

AtoM's database tables - there are a lot of them!


However, on a positive note, the AtoM user forum is always a good place to ask stupid questions and Artefactual staff are happy to dive in and offer advice on how to formulate queries. I'm also lucky to have help from more technical colleagues here in Information Services (who were able to help me get Squirrel set up and talking to the right database and can troubleshoot my queries) so what follows is very much a joint effort.

So for those AtoM users in the UK who are wrestling with their annual accessions report, here is a query that will pull out the information you need:

SELECT accession.identifier, accession.date, accession_i18n.title, accession_i18n.scope_and_content, accession_i18n.received_extent_units, 
accession_i18n.location_information, case when cast(event.start_date as char) like '%-00-00' then left(cast(event.start_date as char),4) 
else cast(event.start_date as char)
end as start_date,
case when cast(event.end_date as char) like '%-00-00' then left(cast(event.end_date as char),4) 
else cast(event.end_date as char)
end as end_date, 
event_i18n.date
from accession
LEFT JOIN event on event.object_id=accession.id
LEFT JOIN event_i18n on event.id=event_i18n.id
JOIN accession_i18n ON accession.id=accession_i18n.id
where accession.date like '2016%'
order by identifier

A couple of points to make here:

  • In a previous version of the query, we included some other tables so we could also capture information about the creator of the archive. The addition of the relation, actor and actor_i18n tables made the query much more complicated and for some reason it didn't work this year. I have not attempted to troubleshoot this in any great depth for the time being as it turns out we are no longer recording creator information in our accessions records. Adding a creator record to an accessions entry creates an authority record for the creator that is automatically made public within the AtoM interface and this ends up looking a bit messy (as we rarely have time at this point in the process to work this into a full authority record that is worthy of publication). Thus as we leave this field blank in our accession record there is no benefit in trying to extract this bit of the database.
  • In an earlier version of this query there was something strange going on with the dates that were being pulled out of the event table. This seemed to be a quirk that was specific to Squirrel. A clever colleague solved this by casting the date to char format and including a case statement that will list the year when there's only a year and the full date when fuller information has been entered. This is useful because in our accession records we enter dates to different levels. 
So, once I've exported the results of this query, put them in an Excel spreadsheet and sent them to one of our archivists, all that remains for her to do is to check through the data, do a bit of tidying up, ensure the column headings match what is required by The National Archives and the spreadsheet is ready to go!

Wednesday, 4 January 2017

Hello 2017

Looking back


2016 was a busy year.

I can tell that from just looking at my untidy desk...I was going to include a photo at this point but that would be too embarrassing.

The highlights of 2016 for me were getting our AtoM catalogue released and available to the world in April, completing Filling the Digital Preservation Gap (and seeing the project move from the early 'thinking' phases to actual implementation) and of course having our work on this project shortlisted in the Research and Innovation category of the Digital Preservation Awards.

...but other things happened too. Blogging really is a great way of keeping track of what I've been working on and of course what people are most interested to read about.

The top 5 most viewed posts from 2016 on this blog have been as follows:

  • Research Data - what does it *really* look like? - A post describing my (not entirely successful) efforts to automatically identify the file formats of research data deposited with Research Data York using DROID. This post spawned other similar posts profiling data using DROID and the cumulative value of all of these profiles is gradually increasing over time. I'm still keen to follow this up with a comparison using the born digital data that we hold at the Borthwick Institute so hopefully that is something for 2017.
  • A is for AtoM - An A-Z (actually I only got to 'Y'!) of implementing AtoM at the Borthwick. This post covers some of the problems and issues we have had to address and decisions we have made as we have gone through the process of getting our new archival management system up and running.
  • Modelling Research Data with PCDM - A guest post by Julie Allinson on some thinking carried out as part of the implementation work for Filling the Digital Preservation Gap project. The post describes some preliminary work to define a data model for datasets using the Portland Common Data Model.
  • Why AtoM? - A look back at why we selected AtoM for our archival management system and how it meets our requirements. This post was in response to a question I was frequently asked and hopefully is useful to others who are going through a similar selection process.
  • From Old York to New York: PASIG 2016 - Quite a long summary of the highlights of the PASIG conference that I attended in New York in October 2016. There was some fantastic content at this event and my post really just scrapes the surface of this!


Looking forward


So what is on the horizon for 2017?

Here are some of the things I'm going to be working on - expect blog posts on some or all of these things as the year progresses.

AtoM

I blogged about AtoM a fair bit last year as we prepared our new catalogue for release in the wild! I expect I'll be talking less about AtoM this year as it becomes business as usual at the Borthwick, but don't expect me to be completely silent on this topic.

A group of AtoM users in the UK is sponsoring some development work within AtoM to enable EAD to be harvested via OAI-PMH. This is a very exciting new collaboration and will see us being able to expose our catalogue entries to the wider world, enabling them to be harvested by aggregators such as the Archives Hub. I'm very much looking forward to seeing this take shape.

This year I'm also keen to explore the Locations functionality of AtoM to see whether it is fit for our purposes.

Archivematica

Work with Archivematica is of course continuing. 

Post Filling the Digital Preservation Gap at York we are working on moving our proof of concept into production. We are also continuing our work with Jisc on the Research Data Shared Service. York is a pilot institution for this project so we will be improving and refining our processes and workflows for the management and preservation of research data through this collaboration.

Another priority for the year is to make progress with the preservation of the born digital data that is held by the Borthwick Institute for Archives. Over the year we will be planning a different set of Archivematica workflows specifically for the archives. I'm really excited about seeing this take shape.

We are also thrilled to be hosting the first European ArchivematiCamp here in York in the Spring. This will be a great opportunity to get current and potential Archivematica users across the UK and the rest of Europe together to share experiences and find out more about the system. There will no doubt be announcements about this over the next couple of months once the details are finalised so watch this space.

Ingest processes

Last year a new ingest PC arrived on my desk. I haven't yet had much chance to play with this but the plan is to get this set up for digital ingest work.

I'm keen to get BitCurator installed and to refine our current digital ingest procedures. After some useful chats about BitCurator with colleagues in the UK and the US over 2016 I'm very much looking forward to getting stuck into this.




...but really the first challenge of 2017 is to tidy my desk!

Wednesday, 7 December 2016

Digital Preservation Awards 2016 - celebrating collaboration and innovation

Last week members of the Filling the Digital Preservation Gap project team were lucky enough to experience the excitement and drama of the biannual Digital Preservation Awards!

The Awards ceremony was held at the Wellcome Collection in London on the evening of the 30th November. As always it was a glittering affair, complete with dramatic bagpipe music (I believe it coincided with St Andrew's Day!) and numerous references to Strictly Come Dancing from the judges and hosts!

This year our project had been shortlisted for the Software Sustainability Institute award for Research and Innovation. It was fantastic to be a finalist considering the number of nominations from across the world in this category and we certainly felt we had some strong competition from the other shortlisted projects.

One of the key strengths in our own project has been the collaboration between the Universities of York and Hull. Additionally, collaboration with Artefactual Systems, The National Archives and the wider digital preservation community has also been hugely beneficial.

Interestingly, collaboration was a key feature of all the finalists in this category, perhaps demonstrating just how important this is in order to make effective progress in this area.

The 4C project "Collaboration to Clarify the Costs of Curation" was a European project which looked at costs and benefits relating to digital preservation activities within its partner organisations and beyond. Project outputs in use across the sector include the Curation Costs Exchange.

The winner in our category however was the Dutch National Coalition for Digital Preservation (NCDD) with Constructing a Network of Nationwide Facilities Together. Again there was a strong focus on collaboration - this time cross-domain collaboration within the Netherlands. Under the motto "Joining forces for our digital memory", the project has been constructing a framework for a national shared infrastructure for digital preservation. This collaboration aimed to ensure that each institution does not have to reinvent the wheel as they establish their own digital preservation facilities. Clearly an ambitious project, and perhaps one we can learn from in the UK Higher Education sector as we work with Jisc on their Shared Service for Research Data.

Some of the project team from York and Hull at the awards reception

The awards ceremony itself came at the end of day one of the PERICLES conference where there was an excellent keynote speech from Kara Van Malssen from AV Preserve (her slides are available on SlideShare - I'd love to know how she creates such beautiful slides!).

In the context of the awards ceremony I was pondering one of the messages of Kara's talk that discussed our culture of encouraging and rewarding constant innovation and the challenges that this brings - especially for those of us who are 'maintainers'.

Maintainers maintain systems, services and the status quo - some of us maintain digital objects for the longer term and ensure we can continue to provide access to them. She argued that there are few rewards for maintainers and the incentives generally go to those who are innovating. If those around us are always chasing the next shiny new thing, how can the digital preservation community keep pace?

I would argue however that in the world of digital preservation itself, rewards for innovation are not always forthcoming. It can be risky for an institution to be an innovator in this area rather than doing what we have always done (which may actually bring risks of a different kind!) and this can stifle progress or lead to inaction.

This is why for me, the Digital Preservation Awards are so important. Being recognised as a finalist for the Research and Innovation award sends a message that what we have achieved is worthwhile and demonstrates that doing something different is A Good Thing.

For that I am very grateful. :-)

Monday, 21 November 2016

Every little bit helps: File format identification at Lancaster University

This is a guest post from Rachel MacGregor, Digital Archivist at Lancaster University. Her work on identifying research data follows on from the work of Filling the Digital Preservation Gap and provides a interesting comparison with the statistics reported in a previous blog post and our final project report.

Here at Lancaster University I have been very inspired by the work at York on file format identification and we thought it was high time I did my own analysis of the one hundred or so datasets held here.  The aim is to aid understanding of the nature of research data as well as to inform our approaches to preservation.  Our results are comparable to York's in that the data is characterised as research data (as yet we don't have any born digital archives or digitised image files).  I used DROID (version 6.2.1) as the tool for file identification - there are others and it would be interesting to compare results at some stage with results from using other software such as FILE (FITS), Apache Tika etc.

The exercise was carried out using the following signature files: DROID_SignatureFile_V88 and container-signature-file-20160927.  The maximum number of bytes DROID was set to scan at the start and end of each file was 65536 (which is the default setting when you install DROID).

Summary of the statistics:

There were a total of 24,705 files (so a substantially larger sample than in the comparable study at York)

Of these: 
  • 11008 (44.5%) were identified by DROID and 13697 (55.5%) not.
  • 99.3% were given one file identification and 76 files had multiple identifications.  
    • 59 files had two possible identifications
    • 13 had 3 identifications
    • 4 had 4 possible identifications.  
  • 50 of these files were asc files identified (by extension) as either 8-bit or 7-bit ASCII text files.  The remaining 26 were identified by container as various types of Microsoft files. 

Files that were identified

Of the 11008 identified files:
  • 89.34% were identified by signature: this is the overwhelming majority, far more than in Jen's survey
  • 9.2% were identified by extension, a much smaller proportion than at York
  • 1.46% identified by container

However there was one large dataset containing over 7,000 gzip files, all identified by signature which did skew the results rather.  With those files removed, the percentages identified by different methods were as follows:

  • 68% (2505) by signature
  • 27.5% (1013) by extension
  • 4.5% (161) by container
This was still different from York's results but not so dramatically.

Only 38 were identified as having a file extension mismatch (0.3%) but closer inspection may reveal more.  Of these most were Microsoft files with multiple id's (see above) but also a set of lsm files identified as TIFFs.  This is not a format I'm familiar with although it seems as if lsm is a form of TIFF file but how do I know if this is a "correct" id or not?

59 different file formats were identified, the most frequently occurring being the GZIP format (as mentioned above) with 7331 instances.  The next most popular was, unsurprisingly xml (similar to results at York) with 1456 files spread across the datasets.  The top 11 were:

Top formats identified by DROID for Lancaster University's research data


Files that weren't identified

There were 13697 files not identified by DROID of which 4947 (36%) had file extensions.  This means there was a substantial proportion of files with no file extension (64%). This is much higher than the result at York which was 26%. As at York there were 107 different extensions in the unidentified files of which the top ten were:

Top counts of unidentified file extensions


Top extensions of unidentified files


This top ten are quite different to York's results, though in both institutions dat files topped the list by some margin! We also found 20 inp and 32 out files which also occur in York's analysis. 

Like Jen at York I will be looking for a format to analyse further to create a signature - this will be a big step for me but will help my understanding of the work I am trying to do as well as contribute towards our overall understanding of file format types.

Every little bit helps.

Tuesday, 15 November 2016

AtoM harvesting (part 1) - it works!

When we first started using Access to Memory (AtoM) to create the Borthwick Catalogue we were keen to enable our data to be harvested via OAI-PMH (more about this feature of AtoM is available in the documentation). Indeed the ability to do this was one of our requirements when we were looking to select a new Archival Management System (read about our system requirements here).

Look! Archives now available in Library Catalogue search
So it is with great pleasure that I can announce that we are now exposing some of our data from AtoM through our University Library catalogue YorSearch. Dublin Core metadata is automatically harvested nightly from our production AtoM instance - so we don't need to worry about manual updates or old versions of our data hanging around.

Our hope is that doing this will allow users of the Library Catalogue (primarily staff and students at the University of York) to happen upon relevant information about the archives that we hold here at the Borthwick whilst they are carrying out searches for other information resources.

We believe that enabling serendipitous discovery in this way will benefit those users of the Library Catalogue who may have no idea of the extent and breadth of our holdings and who may not know that we hold archives of relevance to their research interests. Increasing the visibility of the archives within the University of York is an useful way of signposting our holdings and we think this should bring benefits both to us and our potential user base.

A fair bit of thought (and a certain amount of tweaking within YorSearch) went into getting this set up. From the archives perspective, the main decision was around exactly what should be harvested. It was agreed that only top level records from the Borthwick Catalogue should be made available in this way. If we had enabled the harvesting of all levels of records, there was a risk that search results would have been swamped by hundreds of lower level records from those archives that have been fully catalogued. This would have made the search results difficult to understand, particularly given the fact that these results could not have been displayed in a hierarchical way so the relationships between the different levels would be unclear. We would still encourage users to go direct to the Borthwick Catalogue itself to search and browse lower levels of description.

It should also be noted that only a subset of the metadata within the Borthwick Catalogue will be available through the Library Catalogue. The metadata we create within AtoM is compliant with ISAD(G): General International Standard Archival Description which contains 26 different data elements. In order to facilitate harvesting using OAI-PMH, data within AtoM is mapped to simple Dublin Core and this information is available for search and retrieval via YorSearch. As you can see from the screen shot below, Dublin Core does allow a useful level of information to be harvested, but it is not as detailed as the original record.

An example of one of our archival descriptions converted to Dublin Core within YorSearch

Further work was necessary to change the default behaviour within Primo (the software that YorSearch runs on) which displayed results from the Borthwick Catalogue with the label Electronic resource. This is what it calls anything that is harvested as Dublin Core. We didn't think this would be helpful to users because even though the finding aid itself (within AtoM) is indeed an electronic resource, the actual archive that it refers to isn't. We were keen that users didn't come to us expecting everything to be digitised! Fortunately it was possible to change this label to Borthwick Finding Aid, a term that we think will be more helpful to users.
Searches within our library catalogue (YorSearch) now surface Borthwick finding aids, harvested from AtoM.
These are clearly labelled as Borthwick Finding Aids.


Click through to a Borthwick Finding Aid and you can see the full archival description in AtoM in an iFrame

Now this development has gone live we will be able to monitor the impact. It will be interesting to see whether traffic to the Borthwick Catalogue increases and whether a greater number of University of York staff and students engage with the archives as a result.

However, note that I called this blog post AtoM harvesting (part 1).

Of course that means we would like to do more.

Specifically we would like to move beyond just harvesting our top level records as Dublin Core and enable harvesting of all of our archival descriptions in full in Encoded Archival Description (EAD) - an XML standard that is closely modelled on ISAD(G).  This is currently not possible within AtoM but we are hoping to change this in the future.

Part 2 of this blog post will follow once we get further along with this aim...