In the natural sciences, new findings usually arise from a well-founded assumption (hypothesis) that leads to a prediction. To test the prediction, an accurately designed experiment is derived, the course of which is observed and measured and subsequently analysed and generalised into a theory by means of systematic conclusions. Typically, the information obtained by such measurements is costly and time-consuming and should therefore be performed in the most useful way to verify the hypothesis. The measured data thus confirm or refute a theory. Ultimately, such purposefully created data lead to new insights.
We live in an age in which information is digitally stored on a large scale and available almost everywhere. The amount of passively generated data is growing exponentially. By the end of 2016 alone, nine trillion gigabytes of data had been generated. The aim of the interdisciplinary field of data science is to gain knowledge from this data using statistics, machine learning and data mining ("data mining"). Unlike in a natural science, in this case a scientist rarely starts with a hypothesis. The data must first be "degraded" in order to discover new connections through the identification of patterns. With previous approaches, this is a very demanding task. Our research group led by Prof. Dr. Alfred Ultsch has been working on the discovery of new knowledge from data since the 1990s. We are specialized in deriving methods from nature and using them for data analysis. That is why we call this approach "data bionics". Many examples, such as the DNA of organisms, show that nature has highly efficient data processing.
One way to identify patterns is to search for groups of similar data points. In three dimensions one can imagine patterns like clouds of similar points. Each data cloud has its own shape ("structure") and a distance to another cloud. Within the cloud and between the clouds, each point has a distance ("distance") to each other point and the number of points per cloud ("density") can be counted. Points are similar to each other if they have a small distance to other points. If the distance between two clouds becomes large enough or the density small enough, the clouds can be visibly separated from each other, they belong to two different groups, so-called "clusters".
Conventional methods can usually only recognize one form of data clouds. Procedures that find spherical clouds cannot detect elongated clouds. Procedures, which find clouds with a high distance to each other, fail in data, in which clouds must be separated after the density. This is because a certain criterion is usually optimized; for example, the smallest possible distance is sought within the spherical clouds and attempts to divide the data clouds in this way. A big problem is to know which structures occur in the data, i.e. when a cloud represents a cluster or could be divided into two clouds.
This problem is solved by the Data Bionic Swarm (DBS). He discovers various data structures by using the self-organization of his swarm agents. Self-organization is the spontaneous creation of patterns in a system without an individual agent or module being responsible for it or guiding it. A snowflake is a good example because a set of molecules can form different patterns without a single molecule alone being responsible for this. Just as self-organization arises from the interactions between the molecules in the snowflake, it also arises from the behavior of agents in ant colonies who communicate via the smell.
Self-organisation can lead to emergence. This is the spontaneous and unpredictable occurrence of a new but uniform characteristic of a system. An example of this is a shoal of fish that forms a sphere. In contrast to the snowflake, in which the formation of a pattern could be assumed by binding forces, the spherical formation of the swarm cannot be traced back from the behaviour of the fish. This also illustrate an important difference between self-organisation and emergence: essentially, a pattern of a snowflake could be derived by the biochemical properties of each individual molecule. In contrast, the formation of a ball cannot be predicted from the behaviour of a single fish.
Fish and other swarms, such as herds of cattle or birds, use swarm intelligence. The word "intelligence" has not yet been clearly clarified in the scientific community, but in relation to swarms it can be described as the collective behavior of agents, such as fish or ants. In the case of the data-bionic swarm, each agent carries a data point around on a two-dimensional surface and communicates with other agents. The communication between the agents is derived from nature by observing the sense of smell of ants. The smell is defined by a mathematical formula. Put simply, the smell suggests the similarity between two agents. As soon as all agents decide to rest, the final location of the respective data point on the area is known.
As a first result of the data-bionic swarm point clouds form, which could be drawn on a sheet of paper ("projection"). Common literature usually ignores the problem that the distance of the point-polks in two dimensions does not necessarily have to correspond to the real distance of the clouds in the real three or more dimensions. This problem was solved by my supervisor for a certain case and generalized by me for the data-bionic swarm. Let's stick to the pictorial presentation of the sheet of paper, on which point clouds are depicted. In this way, a virtual 3D landscape in the form of a topographic map is created from this surface in the same way as folds or compressions of a sheet of paper, in which the colours of the map depend on the altitude. The greater the distance between the clouds in three or more dimensions, the higher the respective folding in the sheet of paper. After all folds have formed in the paper, colours are assigned to the heights. If there are two points in the same data cloud, a depression/trough is created, causing both points to slide into a blue lake. If the two data points belong to different data clouds, a brown and finally a white mountain is formed between the two clouds, depending on the distance between the clouds in three or more dimensions.
Of course, such an overall concept is never completely independent. The communication between the agents as well as the basic concept of the visualization of structures is based on many preliminary work of the Data Biology working group. Further problems of conventional methods are the many possibilities of adjustment and optimization, which require deep expertise in handling. Such options are determined data-driven in the data-bionic swarm by applying a concept from game theory. Game theory deals with situations in which decisions have to be made involving several parties (agents or even players). It is used in business, political science, psychology or even in poker. Suppose a person in a community would have to decide whether to water their garden during a period of drought: Then each of the people in this community acts in their own interest (non-cooperative game) when they irrigate exclusively their own garden. On the other hand, the public water supply is not sufficient for everyone to irrigate their gardens. Now all persons could partially water their garden, each person renounces watering or some persons can water their garden completely, assuming that other persons will do without it. Such a situation could be modelled on the basis of game theory and an optimal solution could be found for all parties involved if the boundary conditions were set, e.g. a fine for wasting water during the drought. Often the optimal solution is determined by a Nash equilibrium. Finding the balance means finding an optimal decision for each agent according to certain criteria, taking into account every possible decision and combination for all agents.
There are a limited number of principles for self-organisation and emergence, swarm intelligence and the Nash balance. I succeeded in linking them together and programming the overall concept as the behaviour of an artificial swarm. The combination of the concepts of game theory and swarm intelligence as well as the emergence in an artificial swarm is unique in the scientific community. As a result, the data-bionic swarm is able to find very different structures automatically, precisely because it does not use an optimization criterion, but instead uses emergence. These structures differ in distance and density as well as in the form of data clouds. In this way I was able to show a significant improvement over conventional methods using artificial data clouds with a predefined structure. Conventional procedures have usually been introduced for specially selected problems and are therefore limited to certain structures.
Book Errata . . .
Here typos and other errors of the book will follow.
The Book is available here: