Thursday, December 22, 2011

Can DEiXTo power mobile apps? Yes, it can!

Web content scraped with DEiXTo can be presented in a wide variety of formats. However, the most common choice is probably XML since it facilitates heavy post-processing and further transformations so as to make the data suit your needs. A potentially interesting scenario would be to output bits of interest from a target website into an XML file and then transform it to HTML through XSLT (Extensible Stylesheet Language Transformations). This could be very practical and useful for creating in real time a customized, "shortened" version of a target web page specifically for mobile devices (e.g. Android and iOS devices).
    As you all know smartphones and tablets over the last few years have changed the computer world. So, we thought it would be challenging and hopefully useful to build a web service capable of repurposing specified pages on the fly (through the use of a DEiXTo-based agent), keeping only the important/ interesting stuff and returning it in a mobile-compatible fashion, suitable to fit small screens by harnessing XML, XSLT and CSS. We did not fully implement the service but we got a simple prototype ready to try our idea. And the results were quite encouraging!

    For the needs of our demo we used greektech-newsa technology news blog covering a plethora of interesting and fun topics around the IT industry. In the context of the demo, we supposed that we wanted to scrape the articles of the home page. So, we built a quick test scraper able to extract all the records found and generate an XML document with the data captured. With the use of an elegant XSLT and a CSS we achieved a nice, usable and easy to navigate structure, suitable for a smartphone screen (illustrated in the picture above). You can see live how the output XML file (containing 15 sample headlines) looks like on an online iPhone simulator at the following address:
    The concept of the proposed web service is the following: suppose that you are an app developer or a website owner/ administrator and that you need to display content inside your app or the mobile version of your site, either from a website of your control (meaning that you have access to its backend) or from another, "external" site (with respect to copyright and access restrictions). Often, though, it's not easy to retrieve data from the target website or simply you don't know how to do it.. Therefore, a service that could listen to requests for certain pages/ URIs and return their important data in a suitable form could potentially be very useful. For example, ideally an HTTP request like http://deixto.com/webservice.pl?uri="http://example.com/.." would result in a good-looking XML chunk (formatted with XSLT and CSS) containing the data scraped from the original page (specified with the uri parameter).
    Finally, we would like to bring forward the fact that DEiXToBot contains best of breed Perl technology and allows extensive customization. Thus, it facilitates tailor-made solutions so as to make the data captured fully fit your project's needs. And towards this direction, deploying XSLT and XML-related technologies in general can really boost the utility and value of scraping and DEiXTo in particular!

2 comments:

  1. Your application is very interesting. I suggest that you expand output format support and include JSON http://www.json.org/ and YAML http://www.yaml.org/.
    This way, deixto output could be used very easily and efficiently from other web apps and even directly from javascript apps.

    ReplyDelete