Wednesday, January 4, 2012

DEiXTo powers Michelin Maps and Guides!

One of the biggest success stories of DEiXTo is that it was used a few months ago by the Maps and Guides UK division of Michelin in order to build a France gazetteer web application. If you are going on holiday to France, probably you will need hotel and restaurant guides, maps, atlases and tourist guides relevant to where you are staying or the places you will visit. So, the free online Michelin database can help you find out which ones are for you.
    The contribution of DEiXTo in the context of the implementation of this useful service was that it scraped from Wikipedia geo-location data as well as other metadata fields for 36.000+ French communes. In France the smallest administrative region is the commune and Wikipedia happened to have all of this relevant information freely available!
    The starting target page contained a list of 95 (or so) departments, each of which containing a large number of communes. Thus, every department detailed page would in turn list all its communes and their corresponding hyperlinks/ URLs. A sample department page looks like this. And last, at a level below, we have the actual pages of interest with all the details needed about each commune. You can see a sample commune Wikipedia page by clicking here and a screenshot from it at the picture below. Meanwhile, this "scenario" also serves as a good example of collaborative wrappers where the output of a wrapper (a txt file with URLs) gets passed as input to a second one.
    It should be noted though that there were slight variations in the layout and structure of the target pages. However, the algorithm DEiXTo uses is quite efficient and robust and usually can deal with such cases. To be more specific, the scraper that was deployed, extracted from each commune page the following metadata: region, department, arrondissement, canton and importantly the latitude and longitude.
    The precision and recall that DEiXTo achieved with these commune pages was amazing (very close to 100%) and as a result the database was finally enriched with the large volumes of information captured. We are really happy that Michelin was able to successfully utilize DEiXTo and create a free and useful online service. So, if you plan a trip to France, you know where to find an informative online map/ guide! :)

No comments:

Post a Comment