Visualizing Clarity document categories in a pie chart

The "Cl@rity" program of the Hellenic Republic offers a wealth of data about the decisions and expenditure of all Greek ministries and their organizations. It operates for more than two years now and it is a great source of public data waiting for all of us to explore. However, it has been facing a lot of technical problems over the last year because of the large number of documents uploaded daily and the heavy data management cost. Unfortunately, their frontend and its search functionality is not working most of the time. Thankfully, a private initiative, UltraCl@rity, has come up in the meantime to offer a great alternative for searching the digitally signed public pdf documents and their metadata, filling in the gap left by the Greek government.

    As you probably already know we focus on web scraping and the utilization of the information extracted. One of the best ways to exploit the data you might have gathered with DEiXTo (or another web data extraction tool) is presenting it with a comprehensive chart. Hence, we thought it might be interesting to collect the subject categories of the documents published on Cl@rity by a big educational institution like the Athens University of Economics and Business (AUEB) and create a handy pie chart.

    This page http://et.diavgeia.gov.gr/f/aueb/list.php?l=themes provides a convenient categorization of AUEB's decisions. Therefore, with a simple pattern (extraction rule), created with GUI DEiXTo, we captured the categories and their number of documents. Then, it was quite easy and straightforward to programmatically transform the output data (as of 16th of April 2013) into an interactive Google pie chart with the most popular categories using the amazing Google Chart Tools. So, here it is:

    By the way, publicspending.gr and greekspending.com are truly remarkable research efforts aiming at visualizing public expenditure data from the Cl@rity project in user-friendly diagrams and charts. Of course the deixto-based scenario described above is just a simple scraping example. What we would like to point out is that this kind of data transformations could have some, innovative practical applications and facilitate useful web-based services. In conclusion, Cl@rity (or "Διαύγεια" as it is known in Greek) can be a goldmine, spark new innovations and allow citizens and developers in particular to dig into open data in a creative fashion and in favor of the transparency of public life.

