Algorithms and AI: which questions do you ask?

Insight starts with asking questions. Asking questions about the source, analysis and outcome helps you to better understand data models, algorithms and artificial intelligence (AI). With more insight, you make better choices. On which theme do you want more insight into AI?

Society

Algorithms and AI are often used in our society. Think of the timetable of public transport or the selection of your favorite music. Click below for more examples.

Taxi app

Benefit fraud

Hate speech

Health

Algorithms and AI are often used around our health and care. Think of fitness apps on your smartwatch or research into better diagnoses. Click below for more examples.

ICU alarm

X-ray

Smartwatch

Energy

Algorithms and AI are often used in the transition from fossil fuels to sustainable energy. Think of the smart meter in your home or the control of a windmill. Click below for more examples.

Smart meter

Nuclear power

Natural gas free

Climate

Algorithms and mathematical models are used to gain insight into our climate. Think of the formation of clouds or predicting how much the sea level will rise. Click below for more examples.

Below sea level

Weather forecast

Volcanic eruption

Society

Health

Energy

Climate

Society

Taxi app

Sometimes it seems strange to taxi driver Mike. His working day is not planned out by his employer, but by an algorithm that sends him from customer to customer as efficiently as possible. What criteria does the algorithm in the taxi app actually use to plan its journeys? And what could he do if he wants to change that?

Question 1

Source: Where do the data come from?

Taxi apps collect thousands of data every day. Both drivers and customers share their location and movement data via the app. The system knows where a driver is. But also what the distance from the last customer to a potential new customer is. Everything is used to match supply and demand as quickly as possible.

Question 2

Analysis: What happens to the data?

The taxi app uses an algorithm where drivers move from one ride to another as quickly as possible. The expected time for the current trip is calculated and the expected arrival for the next one is already passed on. Here, choices are made by the algorithm when selecting the rides. It optimizes for to have enough time to make it to the next one, but not so much that the driver is idle. The assumption here is that both customer and driver want to wait as short as possible.

Question 3

Outcome: how are source and analysis used?

The algorithm is set up to optimize for scheduling as many rides as possible. This way, drivers don’t have to spend time finding new rides themselves. In theory they can earn as much money as possible and work continuously and for a long time. In practice, drivers can experience a great deal of pressure, because they are constantly on call. The algorithm could also be set up so that drivers take more breaks. This would give them better working conditions.

Insight

What insight does this provide?

Taxi apps make impressive use of data to plan efficient taxi transport. The algorithm can be set to allow drivers to make as many trips as possible. It can also be adjusted to ensure sufficient breaks and to prevent overworking.

Read more about these questions

Swipe

Society

Benefit fraud

Pieter works for the tax authorities. He must decide who is entitled to childcare benefits and who is not. He is supported by a self-learning algorithm. What questions can Pieter ask to check whether the algorithm’s choices are correct?

Question 1

Source: Where do the data come from?

The self-learning algorithm was trained using data from 30,000 sample files. Some of these were files handled by officials that had been confirmed to be ‘right’ or ‘wrong’. Another part of the files came from a ‘blacklist’ of potential fraudsters. This introduced a non-representative risk estimate into the model; it was trained with files of which it was partly unknown whether they were ‘right’ or ‘wrong’. This creates a selection bias. When would the outcome of the risk assessment have been more reliable? If the algorithm were trained on representative sample files, of which it was checked whether they were indeed ‘right’ or ‘wrong’.

Question 2

Analysis: What happens to the data?

The algorithm used criteria based on a list of indicators chosen by employees of the department. Unfortunately, that way human biases can be incorporated into the algorithm. For example, the characteristics ‘Dutch/non-Dutch’ and ‘income’ were seen as indicators for fraud. To counteract this kind of bias, criteria for selection must be verifiable and public. This does not automatically make them perfect or objective, but it provides the opportunity for adjustment and improvement.

Question 3

Outcome: how are source and analysis used?

In the Dutch ‘benefits affair’ there was no mechanism to check whether the risk assessment for fraud was actually correct. As a result, thousands of people were duped for years, without the system being adapted. When using self-learning algorithms, the outcome should always be tested randomly. So that the method can be corrected when necessary.

Insight

What insight does this provide?

Algorithms are a powerful tool to analyze large amounts of data. However, the results must be verifiable. By asking the right questions, mistakes can be avoided.

Read more about these questions

Swipe

Society

Hate speech

Andor follows various social media and notices that comments are sometimes filtered based on the language used. He is happy that there is attention for rules of conduct online, but wonders how this works. AI works well for language recognition, but how does Facebook decide which comments are labeled hate speech and which aren’t?

Question 1

Source: Where do the data come from?

Everything you post, share or write on social media is stored as data. Recently, researchers from Oxford and the Alan Turing Institute created a database of 4,000 examples. The database is available to everyone worldwide: Hatecheck. This dataset was developed through a combination of 29 different tests and manually determining whether something is classified as hate speech or not.

Question 2

Analysis: What happens to the data?

The researchers interviewed employees of 16 different non-profit organizations working on hate speech online. Based on these interviews, 18 different types of written English hate speech were defined, including derogatory and threatening language. In addition, 11 scenarios were defined that can mislead AI. For instance vulgar words in innocent statements or the use of offensive language in protest messages, so-called ‘counter-speech’.

Question 3

Outcome: how are source and analysis used?

The criteria were tested on three commercial services that use content moderation and collect responses from their readership. One of the conclusions was that hate speech detection by AI still has many challenges. If you moderate too little, you will not solve the problem of online discrimination and intimidation. Moderating too much could censor the kind of language marginalized groups use to empower and defend themselves.

Insight

What insight does this provide?

It is difficult for AI to identify nuances in language and recognize the context and intonation of words. The dividing line between censorship of hate speech and protection of oppressed groups is very narrow.

Read more about these questions

Swipe

Climate

Below sea level

Mounir is looking to buy a house. His dream home is located in a region below sea level. With a 30-year mortgage in mind, Mounir wonders whether it is wise to buy this house. If sea levels continue to rise, won’t the entire region be under water in a few years?

Question 1

Source: Where do the data come from?

Data for climate models are a combination of physical ‘certainties’ and variables that depend on ‘uncertainties’. Certainties are, for example, how molecules react to temperatures (water always freezes at 0℃). An uncertainty is, for example, how much CO2 will be emitted in 50 years. These data are combined in a predictive model. This creates a picture of the effect of the ‘uncertainties’ on the climate and on sea level rise.

Question 2

Analysis: What happens to the data?

What will happen tomorrow is unknown, but that the sun will rise is certain. Certain laws of physics will still apply in 100 years. A predictive model combines ‘certainties’ with factors that are known to be uncertain. The more factors, the greater the number of possible outcomes. By having models calculate all these different scenarios, researchers gain more insight into the most likely scenarios. In this way, the predictions become more and more reliable.

Question 3

Outcome: How are source and analysis used?

Predictive models show a range of possible outcomes for the climate. For example, there is great certainty that the sea level will rise, but to which extent exactly is more uncertain. After all, this partly depends on factors that will come into play in the future and which are currently unknown. In the Netherlands we now take into account a sea level rise of 30-80cm in the year 2100 in the most optimistic scenario. And up to 1.2m in the least favorable case. In both cases, the rise greatly increases the risk of flooding.

Insight

What insight does this provide?

Predictive models give a range of possible outcomes. It is certain that the sea level and the risk of flooding will rise, but by exactly how much is uncertain. Mounir’s future house is affected by this risk.

Read more about these questions

Swipe

Climate

Weather forecast

Ella works at the KNMI where she makes the most accurate weather forecasts possible. This works very well, but is never 100% accurate. She often gets questions from people. Tomorrow’s weather is already uncertain. So how can we know that the climate is changing and that global temperature is rising?

Question 1

Source: Where do the data come from?

Weather data and climate data have many similarities. For example, both use temperature, humidity, ocean and air currents. The big difference is the time scale. Climate models use data that show what the weather was like centuries ago, such as tree rings and carbon dating. Weather models use data from short term events. For example, they focus on clouds and temperature differences between layers of air. For climate, these individual events are not relevant. Climate concerns the average effect of cloud cover in the longer term.

Question 2

Analysis: What happens to the data?

Both weather models and climate models are complex. They take into account many different influences. The daily course of the sun has a major influence on the weather. A winter day may have fewer hours of sunshine than a summer day. So is it colder? Possibly, but not always. A weather model must be as accurate as possible for a very specific place and time. For the climate, local, daily or seasonal changes count less. Climate models calculate on a different scale and thus do not indicate what the weather will look like in a particular place in 100 years. They look at the effect of this weather on global conditions, such as the range of temperature and sea level.

Question 3

Outcome: How are source and analysis used?

A weather model contains parameters that consist of certainties and uncertainties. For example, air currents and wind directions can be predicted fairly well, but the formation of clouds is much more unpredictable. Therefore, tomorrow’s weather forecast in your city is not always accurate. However, this weather has very little influence on the climate in 100 years. For this, models use more stable factors and deal with other types of uncertainties. For example, tomorrow’s weather may be more difficult to predict than a global temperature rise of 1.5℃.

Insight

What insight does this provide?

Predictive models that calculate the weather use different types of data than models that calculate the climate. For both models, an approach is chosen that suits the question and where the reliability of the outcome is as high as possible.

Read more about these questions

Swipe

Climate

Volcanic eruption

Climate scientist Asha is working on mathematical models that combine many factors to predict the climate. Volcanic eruptions are not factored into the models that she uses. Yet, Asha knows that volcanoes contribute to climate change by releasing CO2, ash and other substances.

Question 1

Source: Where do the data come from?

Climate models use different types of data, for example about the composition of the atmosphere and temperature of ocean and land. Also, a lot of data is collected after volcanic eruptions. Such as how much ash is released, how it spreads and which other substances end up in the air and on land.

Question 2

Analysis: What happens to the data?

The change in the atmosphere and its effects on the climate can be well predicted for volcanic eruptions using climate models. For example, based on the size of the eruption and the composition of the ash. After all, the models already take these kinds of factors into account. However, the models do not have the data to predict when and where a volcanic eruption will take place.

Question 3

Outcome: How are source and analysis used?

Climate models can mimic the consequences of a volcanic eruption, but cannot predict when they will occur in the future. This would create a great deal of uncertainty in the model. Therefore, Asha chooses not to include them in predictive models. She involves these kinds of defined events when they occur or have happened. In this way, a scenario is updated based on current events, and the model itself becomes more powerful.

Insight

What insight does this provide?

It is very important for Asha to understand the effect of volcanic eruptions. This allows her to incorporate them into her predictive models. The outcome of her models becomes more reliable by choosing its criteria carefully.

Read more about these questions

Swipe

Health

ICU alarm

A ‘Checklist app’ is used in the intensive care unit (ICU) of the hospital where Joyce works. The app supports doctors and nurses. The algorithm in the app indicates and keeps track of how much and which medication a patient receives. For example, when a patient needs extra pain medication. The algorithm then advises on the type and the dose, based on personalized data. The app gives Joyce clarity on the one hand, and on the other, she is uncertain how reliable the algorithm is.

Question 1

Source: Where do the data come from?

The algorithm uses data to make choices in the checklist. In this case, there are two types of data; the patient’s data (such as weight, age, underlying conditions) and the data used to create the algorithm (such as scientific research, available medication, hospital protocols). For both types, choices have been made about which data is and is not included. For example, whether socio-economic status such as income or origin of the patient is relevant. But also about how far back the data goes into the patient’s medical history. Furthermore, the data from scientific studies must be up to date enough.

Question 2

Analysis: What happens to the data?

An algorithm is a series of calculations, where each next step depends on (adjustable) settings. These settings are made by humans and are based on assumptions and choices. For example, it must be decided together with doctors which liver function is appropriate for which dose of pain medication. By setting these steps as completely as possible, an algorithm is created that takes many different medical aspects into account. An algorithm with the right settings and relevant data supports the doctors and nurses in the ICU.

Question 3

Outcome: how are source and analysis used?

The Checklist app helps Joyce and her colleagues make decisions about ICU patients. The algorithm provides personalized information based on choices and settings made in advance. This reduces the chance of human errors or mistakes. It is important to gain insight into the data that has or has not been used, and the assumptions that have been made. Adjusting settings in response to new insights is crucial when using this app responsibly.

Insight

What insight does this provide?

An algorithm is human-made and offers support in making choices. It remains important to regularly check how reliable the settings are and whether there are factors that need to be adjusted. This is where humans and computers complement each other.

Read more about these questions

Swipe

Health

X-ray

Research-physician Jin is working on a self-learning algorithm that recognizes bone fractures on X-rays. The algorithm works extremely well and extracts 90% of the photos with fractions. For some hospital departments, that percentage is even higher. Before Jin shouts the success from the rooftops, he wonders how the algorithm learned this. What exactly does it recognize?

Question 1

Source: Where do the data come from?

When training the algorithm, Jin used as many X-rays as possible from different departments of the hospital. An algorithm does not ‘see’ a photo as a whole as we do, but as a collection of pixels of different intensities. The data for the algorithm therefore does not consist of a stack of photos, but is instead a long series of numbers from which the algorithm must learn to extract the relevant information.

Question 2

Analysis: What happens to the data?

With ‘black box’ AI, an algorithm teaches itself which information in the series of numbers is relevant. The algorithm is trained with a variety of photos that are marked as ‘bone fracture’ or ‘no bone fracture’. This allows the algorithm to recognize patterns in the two categories. After the training phase comes the testing phase. With a series of new photos, the algorithm is asked to choose the category itself. It is unknown to Jin which pattern the algorithm learned to recognize. If Jin wants to know, he can choose so-called ‘explainable’ AI. The learning process is then divided into steps and the algorithm shows which patterns it recognizes at each step.

Question 3

Outcome: how are source and analysis used?

Jin’s ‘black box’ algorithm turns out to be very good at recognizing photos with bone fractures. However, Jin discovered through explainable AI that the recognized pattern has little to do with the medical data in the photo. Bone fractures are most common in the emergency room and make up a relatively large proportion of the photos. Moreover, these are often taken in a slightly different way than in the other departments. The algorithm recognizes this pattern; different background, position etc. So the algorithm needs to learn to skip the department-specific pattern. Only then is it really useful in identifying bone fractures.

Insight

What insight does this provide?

To estimate the added value of a machine learning algorithm, it is useful to ask about how it has been trained. Explainable AI can provide additional insight into the training process, making the outcome of the algorithm more reliable.

Read more about these questions

Swipe

Health

Smartwatch

A healthy habit; Jamai goes for a run a few times a week and uses an app on his smartwatch to track his heart rate and the length of his run. He also gets a discount on his health insurance thanks to the app. Jamai is happy with the discount, but wonders what the insurer will do with all his data.

Question 1

Source: Where do the data come from?

Data from smartwatch apps can be collected in different ways.

• By observation: the data of all users who have given permission is stored. This data is therefore passively obtained and can be analyzed afterwards.
• By selection through a designed experiment. The researchers select the type of participants in advance based on their research question and actively approach them to participate.

In the first way, a lot of data can be collected with little effort. The second way takes more time and effort.

Question 2

Analysis: What happens to the data?

When a health insurer analyses data collected through observation, they must take into account that these data do not provide a complete reflection of society. After all, the data comes from a specific group. Namely; people who, for various reasons, use a smartwatch app. The active way of collecting data through selection takes more time and effort, but gives more chance of an objective answer to the research question. Because that way, also people who would not otherwise use a smartwatch app are investigated.

Question 3

Outcome: how are source and analysis used?

The health insurer concludes that people who use a smartwatch app are healthier and therefore may receive a discount on their health insurance. This may be true. However, based on observation data alone it is not possible to know whether the use of smartwatch apps makes people healthier. To establish a cause-effect relationship, data from a representative group of people is needed.

Insight

What insight does this provide?

The data collected by apps can contain valuable information and provide clues about people’s health. Health insurers with good intentions need to keep asking themselves: Does his smartwatch actually make Jamai healthier?

Read more about these questions

Swipe

Energy

Smart meter

Ravi lives with his children in a house with a smart energy meter. The meter keeps track of how much power their electrical devices use and when. Ravi can see that through an app. He sometimes wonders; who else has access to that data and what is it used for?

Question 1

Source: Where do the data come from?

Smart energy meters collect large amounts of data, all day every day. The energy meter measures the level of electricity and gas consumption at any time of the day. In the app you can see exactly how much energy you have used and when; your usage data. Data about your type of home, your address, and personal data are also collected, to make sure that the data you see in the app is yours.

Question 2

Analysis: What happens to the data?

The app provides insight into your energy consumption. It only shows data without applying analysis to it. However, the digital collection of data does make it possible to conduct major analyzes and to recognize patterns. Analyzing the consumption data of all energy meters provides insight into how certain neighbourhoods, or perhaps even people, have arranged their lives. What time do you get up and what time do you go to sleep? Your energy consumption provides insight.

Question 3

Outcome: How are source and analysis used?

It is useful to look at your own consumption, but it is not self-evident for this data to be accessible to others. This type of privacy-sensitive data must be handled with care. For example through strict rules for collecting and storing data, so that personal data is separated from consumption data. When energy consumption data are combined and pooled they cannot be linked to your own home. In this way you ensure that the source of privacy-sensitive data is used responsibly.

Insight

What insight does this provide?

Ravi benefits from seeing his usage data. To protect privacy, it must be clear who can view which data. Insight into whether and how personal data is linked to consumption data is also important.

Read more about these questions

Swipe

Energy

Nuclear power

Politician Wim looks into energy transition. How can the Netherlands best generate electricity without using fossil fuels? He wonders why nuclear power is not a viable option according to the predictive models of TNO, the Dutch organization that conducts independent, applied scientific research into, among other things, the energy transition.

Question 1

Source: Where do the data come from?

TNO uses predictive models with cost optimisation. This makes it possible to assess the costs and technical characteristics of a wide variety of energy transition scenarios. The algorithms in the model use data about the costs and technical characteristics of many different technologies. Nuclear energy is one of the technologies included in the model.

Question 2

Analysis: What happens to the data?

The cost optimisation model is used to see how we can achieve a goal such as climate neutrality at the lowest possible cost. In all scenarios of this model, it appears that electricity from solar and wind plays a major role. A scenario supplemented with nuclear power appears to be less optimal. The job of the algorithm is to select for affordability. For that reason, the model does not opt for, more expensive, options with nuclear energy.

Question 3

Outcome: How are source and analysis used?

Giving priority to cost efficiency, this model looks at how the Netherlands can best generate electricity in a climate-neutral future. Does this mean that nuclear energy is excluded from for (ideological) reasons? No, quite the opposite. It is up to politicians like Wim to decide what to do with the outcome of the TNO models. The financial aspect is only one of the considerations in the decision process of energy transition.

Insight

What insight does this provide?

More knowledge of the selection criteria of a predictive model provides more insight. In this case the main criteria involved optimizing costs. This gives Wim insight into how to use the outcome of the model. It allows him to balance his decision.

Read more about these questions

Swipe

Energy

Natural gas free

Brechtje is a municipal official in the Sustainability department. She’s working on a plan to get neighborhoods off natural gas. To become natural gas-free, there are a number of challenges. For example the level of insulation, affordability and alternative heat sources. These challenges are different for every neighbourhood. Models help Brechtje, but what information do they use and how?

Question 1

Source: Where do the data come from?

A model that calculates how to best create a natural gas-free neighborhood involves a lot of data. Data such as the type of homes, level of insulation, location for solar panels and possibilities for geothermal energy or heat networks, for example. Also relevant is whether the homes are owner-occupied or rental properties. In addition, Brechtje must choose which data to use exactly, for example; year of construction or energy label. Information about the year of construction of a house is reliable, but does not say much about the energy label. Energy labels may provide meaningful information about a home, but they are often outdated.

Question 2

Analysis: What happens to the data?

Making choices and making assumptions are an important part of building a model to find the most efficient way to become natural gas-free. For example, choosing to look at the year of construction, means assuming that modern homes are better insulated than old ones. If you opt for feeding the model with data on energy labels, the assumption is that this information provides a sufficient picture of the current situation. It is important to carefully consider and record these assumptions.

Question 3

Outcome: How are source and analysis used?

Which assumption Brechtje chooses, affects the outcome of the model. When using energy labels, for example, she already knows that the data can be outdated. The current situation has improved in many cases. How much insulation is still needed can therefore be overestimated. As a result, natural gas alternatives may seem less feasible than they are. The same problem occurs, but to a greater extent, when choosing the year of construction.

Insight

What insight does this provide?

A mathematical model can help Brechtje on her way, but it is only one of the necessary studies to arrive at a solution.

Read more about these questions

Swipe

Algorithms and AI: which questions do you ask?

Society

Taxi app

Benefit fraud

Hate speech

Music Recognition

Self-driving car

chess computer

Application

Train timetable

Health

ICU alarm

X-ray

Smartwatch

Diagnosis

Intake

Smoothie

Corona Detector

Chemotherapy

Energy

Smart meter

Nuclear power

Natural gas free

Windmill

Solar panels

Energy label

smart grids

Climate

Below sea level

Weather forecast

Volcanic eruption

Tree rings

Satellite measurements

IPCC report

Clouds

Temperature

Society

Health

Energy

Climate

Taxi app

Source: Where do the data come from?

Analysis: What happens to the data?

Outcome: how are source and analysis used?

What insight does this provide?

Benefit fraud

Source: Where do the data come from?

Analysis: What happens to the data?

Outcome: how are source and analysis used?

What insight does this provide?

Hate speech

Source: Where do the data come from?

Analysis: What happens to the data?

Outcome: how are source and analysis used?

What insight does this provide?

Below sea level

Source: Where do the data come from?

Analysis: What happens to the data?

Outcome: How are source and analysis used?

What insight does this provide?

Weather forecast

Source: Where do the data come from?

Analysis: What happens to the data?

Outcome: How are source and analysis used?

What insight does this provide?

Volcanic eruption

Source: Where do the data come from?

Analysis: What happens to the data?

Outcome: How are source and analysis used?

What insight does this provide?

ICU alarm

Source: Where do the data come from?

Analysis: What happens to the data?

Outcome: how are source and analysis used?

What insight does this provide?

X-ray

Source: Where do the data come from?

Analysis: What happens to the data?

Outcome: how are source and analysis used?

What insight does this provide?