What are AI and algorithms?
The use of algorithms and artificial intelligence (AI) is ever increasing. Have you heard of the terms below? They are very common, which makes it important to understand what they mean.
Today, large data sets are collected for analysis in most scientific fields. Science uses algorithms to extract meaningful relationships and patterns from data. These analyzes can help to make decisions in various applications within our society. By data we mean: digital information. You may notice that data is sometimes used as a singular and sometimes as a plural. What do you think fits best?
We often speak of ‘big data’. This type of data is large in volume, varies in content and type, and can change quickly. An example from the healthcare sector is data on age, gender, height, weight, average weekly alcohol consumption, smoking habits, chronic conditions, medical treatments, test results and X-rays. All these data can be stored in different ways. Think of sound clips, videos, written reports, images, graphs and diagrams.
Data can contain information about many different variables. These variables have their own characteristics that are relevant to answering a question. Attributes can be numbers such as age, weight, height, temperature or income. We then speak of numerical data. Or characteristics may fall into categories, such as eye or hair color, ethnicity, field of work, or hobbies. We then speak of categorical data. Algorithms can use both numerical and categorical data.
Worldwide, researchers, publishers and funders of research have agreed that scientific data should be available to everyone as much as possible. In this way, data is used optimally and researchers can also check each other’s experiments. Data is stored according to the so-called FAIR principles: Findable, Accessible, Interoperable and Reusable. It also takes into account privacy, sensitivity and intellectual property rights. Read more about this on the website of the National Platform Open Science (NPOS).
An algorithm is a series of mathematical instructions used to find patterns or make calculations. You can compare an algorithm to a recipe for a cake. Here you have different ingredients that you have to mix in a certain way for the right result. Algorithms are used in AI to find relationships between different data sets.
AI stands for artificial intelligence. AI is a collection of algorithms in a system that uses data and rules to make judgments or predictions. In practice, the term AI is often used for self-learning computer programs. These are programs that use algorithms. Based on results and through training, the programs can adapt to get as close as possible to the best result. The term self-learning means that the algorithms can carry out the instructions it has received from humans, but cannot add anything to it itself.
At the moment, only systems of so-called ‘narrow’ AI are used. In addition, AI imitates intelligence, but is not yet able to think for itself. The so-called ‘general’ AI, as we often encounter in movies, is really still science fiction.
A model is a form of AI, in which different data and algorithms are used to arrive at a complex outcome. Classification models predict which categories the data belongs to. Like analyzing emails and predicting whether they should be labeled “spam” or “not spam.” Regression models make numerical predictions. For example, estimating how many people will die from the flu by looking at how the virus has spread in recent months.
The outcome of mathematical models and AI depends on the quality of the data. Conditions for good quality data are:
- The data are representative and the sample is large enough
- The data collection is systematic and standardized
- The data are solid and reliable: do you measure what you think you measure?
We will get into more detail below.
The amount of data available is almost endless. How do you choose from all that data what you need for an algorithm? Asking the right question helps. For example, if you want to know how traffic jams arise during the holidays, you want your traffic data to be representative. It is obvious that you choose data about highways and not about traffic lights. But are you going to look at all highways or do you make a certain selection? How that selection is made affects how representative your algorithm will be.
Bias officially means prejudice. In practice, you get bias when assumptions are made. Making an algorithm always requires assumptions. And what assumptions you make is a choice. It is virtually impossible that choices are always objective. However, choices can be made verifiable and transparent. This makes bias visible so that the algorithm can be adjusted.
AI is generalizable when the conclusions of the set of algorithms are true and applicable to the group of people who meet the algorithms’ conditions. With AI that is not generalizable, the conclusion may only help some groups, but not everyone. This need not be a problem, as long as it is clear for which groups the conclusion is and is not valid.
The data must be collected in a standard way, so that the data can be compared properly. For example: comparing temperature measurements from satellites to each other instead of comparing data from satellites with measurements from a thermometer.
By reliability we mean how consistently AI produces the result we seek, without producing results we do not seek. In practice, the AI must therefore be checked and tested, and adjusted when necessary. Technically, reliability can also mean that AI is able to produce the same result every time.
Sometimes it is impossible or unrealistic to make certain measurements, and researchers can instead opt for a proxy. Literally this means an ‘approach’ or ‘representation’. A proxy can be used if it is scientifically proven and demonstrably consistent with direct measurements. So that you can trust that proxy measurements match what you want to measure. An example is the growth ring in a tree trunk. These say a lot about the situation in which the tree grew. Thick growth rings correspond to years of high growth and thin rings correspond to years of low growth. Trees grow faster in warm and wet conditions. In this way, growth rings can be used to approximate what the temperature was in the past, going back many decades. Another example of a proxy is BMI (body mass index). This is a ratio between height and body weight and therefore does not directly measure how much fat someone has. The ratio corresponds very well with body fat. The higher the BMI, the more fat. Therefore, BMI is widely used as a proxy for obesity.