Friday, January 20, 2012

Open Archives & Digital Libraries

The Open Archives Initiative (OAI) develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. OAI has its roots in the open access and institutional repository movements and its cornerstone is the Protocol for Metadata Harvesting (OAI-PMH) which allows data providers/ repositories to expose their content in a structured format. A client then can make OAI-PMH service requests to harvest that metadata through HTTP. is a great federated search engine harvesting 57 Greek digital libraries and institutional repositories (as of January 2012). It currently provides access to almost half a million(!) documents (mainly undergraduate theses and Master/ PhD dissertations) and its index gets updated on a daily basis. It began its operation back in 2006 after being designed and implemented by Vangelis Banos but since May 2011 it is being hosted, managed and co-developed by the National Documentation Centre (EKT). What makes this amazing searching tool even more remarkable is the fact that it is entirely built on open source/ free software.
    A tricky point that needs some clarification is that when a user searches, the search is not submitted in real time to the target sources. Instead, it is performed locally on the server where full copies of the repositories/ libraries are stored (and updated at regular time intervals).
    The majority of the sources searched by are OAI-PMH compliant repositories (such as DSpace or EPrints). Therefore, their data are periodically retrieved via their OAI-PMH endpoint. However, it is worth mentioning that non OAI-PMH digital libraries have also been included in its database. This was made possible through scraping their websites with DEiXTo and transforming their metadata into Dublin Core. So, more than 16.000 records from 6 significant online digital libraries (such as the Lyceum Club of Greek Women and the Music Library of Greece “Lilian Voudouri”) were inserted in with the use of DEiXTo wrappers and custom Perl code.
    Finally, it is known that digital collections have flourished over the last few years and enjoy growing popularity. However, most of them do NOT provide their contents in OAI-PMH or another appropriate metadata format. Actually, many of them (especially legacy systems) do NOT even offer an API or an SRW/U interface. Consequently, we believe that there is much room for DEiXTo to help cultural and educational organizations (e.g., museums, archives, libraries and multimedia collections) to export, present and distribute their digitized items and rich content to the outside world, in an efficient and structured way, through scraping and repurposing their data.


  1. This looks like a really great project to get more data out, and I'm very happy that you contacted me about using Omeka as a piece of the puzzle. Since Omeka has an OAI-PMH plugin and CSV import, hopefully it'll fit into a good system of tools to push more data out to the world. Good luck!

    Patrick Murray-John

    1. Thank you Patrick for your comment and your kind words. Indeed, Omeka is a remarkable web publishing platform and its CSV import plugin could turn out to be a very useful tool. Its combination with DEiXTo could potentially help non OAI-PMH digital collections to export and distribute their content in OAI-PMH format (with all the advantages this brings).