Analysing the Entire Wikipedia History with Database Supported Haskell

George GiorgidzeTorsten Grust • Iassen Halatchliyski • Michael Kummer

Proceedings of the 15th International Symposium on Practical Aspects of Declarative Languages (PADL 2013), Rome, Italy. Springer, January 2013.

In this paper we report on our experience of using Database Supported Haskell (DSH) for analysing the entire Wikipedia history. DSH is a novel high-level database query facility allowing for the for- mulation and efficient execution of queries on nested and ordered collections of data. DSH grew out of a research project on the integration of database querying capabilities into high-level, general-purpose programming languages. It is an emerging trend that querying facilities embedded in general-purpose programming languages are gradually replacing lower-level database languages such as SQL as preferred facilities for querying large-scale database-resident data. We relate this new approach to the current practice which integrates database queries into analysts’ workflows in a rather ad hoc fashion. This paper would interest early technology adopters interested in new database query languages and practitioners working on large-scale data analysis.