DRUPALCON 2012, HTML5 microdata
The new HTML5, a language used to build web pages, is being developed by the Web Consortium and it will enable machine to better understand the data contained in internet pages. At present HTML5 is still being developed, but the adoption of a new vocabulary for describing the content seems to be definitive, some terms may be added or removed but the great majority of it is complete.
The main problem with the way machines understand data is contextual information, they're very good at quickly processing huge amount of information but if these information are not clearly described they'll be treated in the same way. An example is the current structure of a blog page, for instance, say this post on process.arts.
Looking at this page a person will identify who's the creator, where's the title, which tags describe the page and if there are related items on the same website and maybe links to external pages. All these information are made obvious by the visual aspect of the page: we can distinguish between the title and the text thanks to its visual aspect: these are contextual information.
A machine, instead, will read the page and will extract the data according solely on their type: characters for texts, links, external links and some styling. Clearly a machine can just understand what humans programmed it to understand, and now we're realizing that we need machines to know more of this process.arts page, in our example.
In order to make contextual information available to machine, HTML5 will label every piece of information in a page with a particular term from the new vocabulary, in a way that to a machine a link containing the name of the author of the article will mean something different from a tag to describe the text. Therefore machines will be able to read and exchange also contextual information between pages, making searches more effective. Your internet browser will know if anywhere in the web the same author of the text you're reading has written something similar on the same topic, and maybe will suggest it for further reading.