Showing posts with label Digital Libraries. Show all posts
Showing posts with label Digital Libraries. Show all posts

Sunday, February 19, 2012

Linked Data & DEiXTo

As explained in a previous postDEiXTo can scrape the content of digital libraries, archives and multimedia collections lacking an API and enable their metadata transformation (through post-processing and custom Perl code) to Dublin Core and subsequently in OAI-PMH or another suitable form, e.g. Europeana Semantic Elements (ESE).
    Meanwhile, the Web has become a dynamic collaboration platform that allows everyone to meet, read and more importantly write. Thus, it steadily approaches the vision of Tim Berners-Lee (the inventor of the World Wide Web): the Linked Data Web, a place where related data are linked and information is represented in a more structured and easily machine-processable way.
    Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. Its key technologies are URIs (a generic method to identify resources on the Internet), the Hypertext Transfer Protocol (HTTP) and RDF (a data model and a general method for conceptual description of things in the real world). It is an exciting topic of interest and it's expected to make great progress in the next few years. A video that does a nice job of explaining what Linked Open Data is all about can be found here: http://vimeo.com/36752317
    Over the last decade, the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) has become the de facto standard for metadata exchange in digital libraries and it's playing an increasingly important role. However, it has two major drawbacks: it does not make its resources accessible via dereferencable URIs and it provides only restricted means of selective access to metadata. Therefore, there is a strong need for efficient tools that would allow metadata repositories to expose their content according to the Linked Data guidelines. This would make digitized items and media objects accessible via HTTP URIs and query able via the SPARQL protocol.
    Dr Haslhofer has performed significant research and work towards this direction. He has developed (among others) the OAI2LOD Server based on the D2R Server implementation and wrote the ESE2EDM converter, a collection of ruby scripts that can convert given XML-based ESE source files into the RDF-based Europeana Data Model (EDM). These remarkable tools could turn out very useful for making large volumes of information Linked-Data ready, with all the advantages this brings.
    Linked Open Data can change the computer world as we know it. So, there is a lot of potential in combining DEiXTo with Linked Data technologies. Their blend could eventually produce an innovative and useful outcome. Many already believe that Linked Data is the next big thing. Time will tell. Meanwhile, DEiXTo could definitely help you generate structured data in a variety of formats from unstructured HTML pages, either your ultimate goal is Linked Data or not.

Friday, January 20, 2012

Open Archives & Digital Libraries


The Open Archives Initiative (OAI) develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content. OAI has its roots in the open access and institutional repository movements and its cornerstone is the Protocol for Metadata Harvesting (OAI-PMH) which allows data providers/ repositories to expose their content in a structured format. A client then can make OAI-PMH service requests to harvest that metadata through HTTP.
    openarchives.gr is a great federated search engine harvesting 57 Greek digital libraries and institutional repositories (as of January 2012). It currently provides access to almost half a million(!) documents (mainly undergraduate theses and Master/ PhD dissertations) and its index gets updated on a daily basis. It began its operation back in 2006 after being designed and implemented by Vangelis Banos but since May 2011 it is being hosted, managed and co-developed by the National Documentation Centre (EKT). What makes this amazing searching tool even more remarkable is the fact that it is entirely built on open source/ free software.
    A tricky point that needs some clarification is that when a user searches openarchives.gr, the search is not submitted in real time to the target sources. Instead, it is performed locally on the openarchives.gr server where full copies of the repositories/ libraries are stored (and updated at regular time intervals).
    The majority of the sources searched by openarchives.gr are OAI-PMH compliant repositories (such as DSpace or EPrints). Therefore, their data are periodically retrieved via their OAI-PMH endpoint. However, it is worth mentioning that non OAI-PMH digital libraries have also been included in its database. This was made possible through scraping their websites with DEiXTo and transforming their metadata into Dublin Core. So, more than 16.000 records from 6 significant online digital libraries (such as the Lyceum Club of Greek Women and the Music Library of Greece “Lilian Voudouri”) were inserted in openarchives.gr with the use of DEiXTo wrappers and custom Perl code.
    Finally, it is known that digital collections have flourished over the last few years and enjoy growing popularity. However, most of them do NOT provide their contents in OAI-PMH or another appropriate metadata format. Actually, many of them (especially legacy systems) do NOT even offer an API or an SRW/U interface. Consequently, we believe that there is much room for DEiXTo to help cultural and educational organizations (e.g., museums, archives, libraries and multimedia collections) to export, present and distribute their digitized items and rich content to the outside world, in an efficient and structured way, through scraping and repurposing their data.