SERIOUS GAMES FOR BUILDING DATA CAPACITY*

Open data can support the creation of new services, facilitate research, and provide insights into everyday issues affecting citizens. Although public administrations are making efforts to create sustainable and inclusive open data systems, there is limited capacity to identify suitable datasets, clean, release, and reuse them. Serious games offer a possible solution for data capacity building and have already been used to train civil servants and citizens on the topic of open data. This research presents a review of serious games and discusses their potential for data capacity building. The games selected in the review are classified and described according to their different learning outcomes, formats, and type of media. Most serious games found in this review can be categorized as teaching games and are designed to raise data awareness, which is only a limited aspect of building data capacity. We found a lack of design games, research games, and policy games. Given their success for ideation in other fields, design games offer a particular opportunity to build data capacity by generating new ideas about how to reuse open datasets.


INTRODUCTION
Open data is any data that is freely accessible and reusable by anyone for any purpose [1]. Open data can be reused to create or improve services, and to identify local issues and community needs more easily [2]. While public sector organizations play a significant role in releasing datasets to the public, the private sector may also open datasets to the public [3]. In this research, we will refer to the general concept of open data to include datasets released by both the public and the private sector.
The opening and reuse of datasets involves different actors and services, such as data providers, publishing organizations, infomediaries, tools for data storage and analysis, and researchers looking for data [3]. Opening data can effectively create a network of complex interdependencies and networks of interaction, an "ecosystem" [3].
Within the open data ecosystem, non-expert users (such as citizens and public administrators) have an important role in that they are aware of the issues and needs of their communities, which can be addressed using open data [4]. On the other hand, expert users, such as civic hackers and developers, own the skills required to implement practical solutions using open data [4]. Mulder, Jaskiewicz, and Morelli [5] explored recent paradigm shifts that have the potential to seed change within societal systems and look specifically at how open data can become a new type of "commons'" that can support digital citizenship. In the current work, we explore the use of serious games for building data capacity in problem-driven societies. Alongside the delivery of open data-driven solutions, open data can only become a new commons if a larger community and culture of working with data is created around it.
Serious games offer an important tool to bring together both expert and non-expert users and transfer the required knowledge and skills needed to work with open data. Serious games differentiate themselves from entertainment games in that their main purpose is not to amuse, but to educate [6] and they have been in use for over a decade to facilitate learning and ideation [7]. Some serious games adapt game mechanics from commercial video games to achieve educational objectives. For example, "Socrates Jones: Pro Philosopher" [8] takes inspiration from "Ace Attorney", a popular legal drama game which uses visual novel mechanics. The developers of Socrates Jones used Ace Attorney's mechanics but created dialogues and game content to teach philosophical thinking. In the public sector, serious games have been used in different scenarios, such as to ideate service delivery principles [9] and to train railway traffic controllers [10], among others.
In the remainder, we review serious games for open data and elaborate upon their potential contribution for building data capacity. We define building data capacity as the process that empowers citizens and civil servants to understand and reuse open data, thereby creating the needed practical and analytical skills.
This research will answer the following research questions: 1. Which gamesor types of gameshave the potential to build data capacity? 2. What kind of data capacity can these serious games build?
The review starts by looking at the list of games on the topic of open data compiled by Kleiman [11]. Entries are filtered according to four criteria, selecting interventions that: (1) are sufficiently documented, (2) fit the definition of a "game", (3) must also fit the definition of "serious game", and (4) have an educational purpose that is related to building data capacity. We analyze selected games using the classification by Grogan and Meijer [12], assigning them a type based on the kind of knowledge transferred or created by the game and its beneficiary.

CONCEPTUAL FRAMEWORK
To analyze the serious games selected in the review, we use the classification by Grogan and Meijer [12]. Starting from the type of knowledge that the game deals with and its beneficiary (see table 1), Grogan and Meijer [12] identify four broad categories of games. Policy games are based on real world scenarios so that the participant can experiment with different solutions and gather knowledge about the scenario represented in the game. Teaching games are based on a fictional setting, with the knowledge transferred by the game being generalizable and not based on a specific scenario. Design games "provide a participatory environment" [12, p.545] and can be used to ideate new artifacts and create new knowledge. Finally, research games are used to observe participants in an experimental setting and test hypotheses.

Organizational learning
Policy intervention

Interactive visualization
Collaborative design The paper is structured as follows: first, we describe the methodology used to compile a list of games for building data capacity. We then present our results, giving a brief description of each game and summary of their main characteristics and learning outcomes. We then discuss how serious games contribute to data capacity building and which specific aspects of this process they aim to tackle, followed by a summary of our conclusions.

METHODOLOGY
The list of gamified interventions related to data compiled by Kleiman [11] was used as a starting point to map games for data capacity. The list was screened using the following filters: 1) The intervention should have sufficient documentation to allow for the intervention and its educational content (if present) to be analyzed and categorized. This can include game manuals, scientific publications, or an actual playable copy of the game available online.
2) The intervention must be a game, meaning it must be an "attempt to achieve a specific state of affairs (prelusory goal)" while being limited by certain rules, which are accepted by the player(s) because they enable the game play [13, p. 41] as cited by [14].
3) The intervention must fit the definition of "serious game" by Abt [6] as cited in Djaouti et al. [15], meaning it should have an "explicit and carefully thought-out educational purpose" and the primary reason to play should not be entertainment. 4) The intervention's educational purpose must be related to the goal of "building data capacity", meaning it must be aimed at providing skills such as general knowledge about open data, data reuse, or operational and technical knowledge about how to use and visualize datasets [16].
The literature review on data-related gamified interventions by Kleiman [11] included a total of 23 entries. From these, two interventions were excluded as they didn't meet the definition of a "game" (filter 2). One intervention was excluded as it was not sufficiently documented.
Two interventions were excluded as they are not serious games, but rather entertainment games (filter 3). Ten interventions were excluded because, while they use open datasets to generate playable content, the educational purpose of the intervention is not directly related to building data capacity (filter 4). For example, Bar Chart Ball [17], generates bar charts from various datasets, such as the percentage of people who feel they can influence decisions in different cities in the UK. A ball is dropped on top of the bar chart and starts sliding around under the force of gravity. The aim of the game is "to control this ball, and make it go where they want" [17, p.1]. While this is an example of a data-related game and an interesting reuse of open datasets, its main educational outcome seems to be the memorization of the shapes of different bar charts, which is not directly related to building data capacity. For similar reasons, we filtered out the other games described by Gustafsson Friberger et al. [18] which reuse datasets to procedurally generate content but are not related to building data capacity.
To describe and categorize the serious games for data capacity building, we used similar variables to the ones suggested by Katsaliaki and Mustafee [19]. Variables to be captured were selected based on their relevance and scope of this research and to give a sufficient overview of the game's general characteristics. In a similar fashion to Katsaliaki and Mustafee [19], the data was collected by researching available materials about the game (cards, manuals, etc.), related publications, playing the games, or reading their descriptions on the respective websites. For each game, the general gameplay and rules are described, along with details about the game's platform, genre, learning objective, and learning purpose.
In addition to this classification, hereafter, we describe each game, and its expected contribution to data capacity building.

CASE DESCRIPTIONS
Further in the text we introduce the twelve games selected, along with a short description of the rules and gameplay. The main characteristics of each game are summarized in Table 3.

Agenda 2030
Agenda 2030 is a discussion game for 6 to 31 players. A set of 50 cards representing 5 departments represent reports, maps and documents which are needed to monitor the Sustainable Development Goals within a local governmental context (Municipality of Teresina, in Brazil). One participant plays as the database for the teams, and the others are distributed through the 5 different departments of the local government. Each team has a negotiator which trades data with other teams. By trading cards, players need to find the specific datasets to complete their SDGs indicators. Completing indicators give teams another type of card, with random events, making the game more fun. The game ends when the full indicator checklist is completed.

Data Belt
Data Belt is a four-player online video game which shares some aspects with Winning Data (described later in this list), such as the four different player roles, the basic dynamic of answering citizen's demands for public services, generating datasets, and deciding whether or not to open. The game was tested in a pre-experimental setting and "participants were more inclined to believe that some public sector data can be shared" [20, p.162]. The game can be useful when played together my civil servants with different levels of experience in open data decision-making, as it can facilitate knowledge sharing among the players.

Data Dealer
Data Dealer is a single player online game about privacy issues related to data brokers and the resale of personal information [21]. The user fills the shoes of a corrupt data broker, trying to make as much profit as possible from shady deals with tycoons and corporations. The player owns a database connected to certain data sources (like dating sites and online personality tests). Money can be invested to upgrade these data sources, therefore capturing more data which can then be resold to corporations with dubious aims. Data Dealer is a management game, where the player needs to carefully balance resources to maximize profit. This game could be an important tool to understand the role of data brokers and how they manage to harvest (legally and illegally) data from different sources.

Digital Identity game (Data gedreven werken game)
The Digital Identity game is a board game where players need to reach the center of the board with remaining resources. Specific spots with discussion logo reduce the number of available resources from players -representing the loss of pieces of her digital identity. In some cases, disagreements between players need to be voted upon. The search engine DuckDuckGo is used to solve doubts about operating services. As defined by Zuboff's Surveillance Capitalism, when the players lose all their resources (a metaphor to giving away all her personal data), they are only the carcasses that remain when the data is plundered [22].

Datak
Datak [23] is a single player online game based on a journalistic investigation into the problematic aspects of big data [24]. In Datak, the player interprets the role of a new hire as the assistant to the mayor of DataVille. Part of the job is to make decisions that can affect the players and citizens, for example by deciding what kind of precautions to take when archiving voters' information or when a security breach occurs. Datak was developed after a journalistic investigation; its aim is to raise awareness about the implications of data collection and privacy violations. Datak could be useful in introducing a non-expert audience to the most common ways in which data privacy rules are violated and the basic terminology to describe these violations.

Datascape
Datascape is a board game in which the players are given research questions that can be answered using data [25]. The players are also given a stylized map, on which they need to point where to source the data from. Each section of the map possesses certain data types such as light, weather, wind, water level, etc. Datascape can play a role in introducing a nonexpert audience into data collection and the different sources of datasets.

Dataspel
Dataspel is a board game in which a team leader is responsible to coordinate the team in making discoveries based on data. Each member of a team has a certain role, either being a content expert or a data expert. Each game round consists of three phases, from distributing the work to analyzing the available datasets. Specific problems and politically sensitive topics can influence the analysis and publications. Scores are defined based on the number of points each team leader archives by the end of the game for analyzing and publishing datasets.

Datopolis
Datopolis is a board game which can be played by two to five players [26].

Jogo de Governo Aberto
The Open Government Game is a card game involving 4 to 6 players, each of them receiving a specific set of cards to be used in the gameplay. Each set contains actions related to specific actions on Transparency, Participation, and Accountability. These are considered as the main pillars to an open government, which the players must collaborate to achieve. The game has been adapted for remote play in tabletopia [27] though it is still only available in Portuguese.

Open Data Card Game
The Open Data Card game is an in-person game for multiple groups of three people [28], designed for ideation during workshops and hackdays. The game is aimed at getting participants excited about the possible uses and combinations of open datasets and generating new ideas. This game could be an effective way of facilitating brainstorming during hackathons, when participants need to think of ways to reuse datasets.

Run that town
Run that town is a single-player mobile game which uses real data from Australia's 2011 census [29]. The player can enter their postcode to customize the experience with data from their neighborhood. The player fits the shoes of a local politician, taking decisions about what kind of public works to initiate and where to spend money.

Winning Data
Winning Data is a four player in-person role-playing game [30] about open data. In Winning Data, players interpret the roles of civil servant, colleague, citizen, and boss and need to collaborate to answer citizens' demands for public services. Just like in a real-life public office, this activity leads to the creation of the datasets, which the team can either completely open to the public, partially share (removing some personal information), or completely close. In an experimental setting, after playing the game, civil servants had a "better understanding of the positive outcomes of data opening" [31, p. 18], thus showing potential for building data capacity among public sector employees. Similarly to Data Belt, this game can facilitate knowledge sharing about the risks and benefits of opening a given dataset, especially when a mix of more and less experienced decision-makers are playing.
The following two tables provide a summary of the selected case descriptions. Table 1 summarizes the cases (serious games) reviewed, their developer, availability (either in-person gameplay or digital), type of game (board game, role-playing game, etc.) and recommended number of players. Table 3 more specifically identifies each of the games' stated learning outcomes, their classification according to the categories identified by Grogan and Meijer [12], and how they each might contribute to building data capacity. As no specific classification system for serious games and data capacity exists, we broadly labeled each game as contributing to either debate, data awareness or ideation. Further research could investigate how to apply existing frameworks on data literacy, such as the ODI data skills framework [32], to serious games.

DISCUSSION
We defined building data capacity as the process of empowering citizens and civil servants to reuse open data so that they can gain new insights about the world around them and create better services. With our two research questions, (1) we investigated which gamesor types of gameshave the potential to build data capacity and (2) what kind of capacity they can build. As shown in the case descriptions, serious games can play a significant role in building data capacity by raising data awareness, facilitating debate around open data and ideation for data reuse. However, from the review and analysis of existing games for building data capacity, it emerges that most games only focus on a limited aspect of this process, which is raising data awareness. In fact, most games only fit the teaching category identified by Grogan and Meijer [12]; meaning that they focus on transferring generalizable knowledge to the players or between the players.
Only one example of a design game was found through the literature review, the "Open Data Card Game". When using the game in a workshop, the facilitator can create card decks customized for the group that is about to play and insert datasets that the players might be already familiar with. The group can then use the custom cards to brainstorm together ideas for how to reuse these datasets, thereby generating new knowledge. The presence of only one design game suggests an interesting gap in games that can be used for ideation in the field of open data. Design games have been used to successfully facilitate idea generation in other fields. Brandt and Messeter [33] described several design games used for idea generation and found that games facilitate this process by creating artificial restrictions, which stimulate creativity. Agogué et al. [34] created a serious game for the employees of a company specialized in treatments for malnutrion. Each participant had to interpret a persona described by the game, for example "rural school director" or "deputy mayor of Jakarta East". Participants had to come up with new ideas that could create value for this persona. Game rules instructed players to divide in groups and change their composition at regular intervals. Finally, players could participate in a "marketplace of ideas" and work on the most promising proposals. Agogué et al. [34] found that serious games "play an effective role in supporting the management of heterogeneous and divergent knowledge during ideation" [34; p.423]. There is a need to explore the potential of serious games to play a similar role in ideation with data.
"Run that Town" is the only example of a policy game, which uses contextual knowledge to generate real-world scenarios. The game achieves this by looking at census data for the player's postcode, thus reflecting the real conditions of the neighborhood. The lack of policy games that make use of contextual knowledge is also an interesting gap. The review did not find any examples of research games, which are used to test or generate hypotheses or to assess other artifacts.

CONCLUSIONS
Our work presented a review of existing games that can contribute to building data capacity.
To elaborate this review, we played several serious games and analyzed their content and game materials. We then categorized each game according to the type of knowledge it transfers and to which beneficiary. We also looked at the kind of capacity building that each game contributes to. The main finding that emerged through our review is that most games tend to build data capacity by raising data awareness. We found a lack of design games that can be used to generate new ideas about the reuse of open data. While this type of game has been successful in other fields, we only found one such example in the context of open data. Future research should explore the opportunities offered by different types of games, either by developing entirely new games or adapting existing ones from different fields.

AKNOWLEDGEMENT
This work has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 955569.