Mitigating uncertainty: detecting and controlling unknowns unknowns

Image generated by Midjourney from the prompt: “/imagine an illustration of something we don’t know that we don’t know”

In any situation, there are unknowns that can be spotted, and those we don’t even suspect. These are the “unknown unknowns”, potentially very numerous with AI. To identify these risks and control AI more effectively, Professor Bernard Sinclair-Desgagné has developed a method for understanding them.

By Kevin Erkeletyan

Take a map of the world drawn by 17th century European geographers. On this map, the shape and size of Western Europe reflects reality fairly well. Thanks to numerous verifiable accounts, European cartographers knew that they knew the landscape of the continent: it was “known to be known”. Conversely, representations of India, Siberia, China, Japan, California and Mexico were a lot vaguer. Given the novelty and relative scarcity of reliable information, people of the time could only make educated guesses, often combining several possible landscapes. What remained unknown about these regions could nonetheless be described; “the unknowns were known”. Lastly, a few regions were left blank: the Arctic, North-West America and Antarctica. European scientists did not venture to outline these areas. These regions belonged to the realm of “unknown unknowns”.

“Unknown unknowns” are elements whose existence we do not even suspect. Various accounts from every sphere of human activity – science and technology, the arts, corporate strategy and public policy – reveal that considerable gains or losses could have been achieved or avoided if “unknown unknowns” had been anticipated in time. However, it is still difficult to get an idea of these elements. Conventional risk analysis is of little help because it focuses on “known unknowns”: situations in which all potential outcomes (even “black swans”, i.e. unforeseen events) can at least be identified.

THINKING “OUTSIDE THE BOX”…

In a recent publication, I put forward a method for understanding “unknown unknowns”. This approach is based on formal concept analysis (FCA), a branch of mathematical order theory increasingly used in data mining. It seems to be widely applicable, as it can integrate various types of data: quantitative and qualitative, objective and subjective, financial and non-financial. And lastly, it is easily accessible, as it comes down to using tables and spreadsheets.

I started from the principle that if the description of everything that could happen follows a structure similar to the one used to communicate current knowledge, then tangible clues to “‘unknown unknowns” ought to be found in the available data. These clues can be uncovered by thinking “outside the box” and then returning “inside the box” in a methodical way.

In FCA, a “context”, in its simplest but most common form, is a set of objects, a set of attributes and a relationship indicating which objects have which attributes. The main way of organising data is then to use “concepts”. A concept consists of a list of objects, O, and a list of attributes, A, where the objects in O are precisely those that share all the attributes in A, and the attributes of A are precisely those shared by all the objects in O.

Let’s assume that current knowledge can be represented as a context; we’ll call it the existing context. Eventually, as the known unknowns and unknown unknowns are revealed, this context will expand into a larger context; let’s call it the end context.

In order to identify the clues that current data might reveal about “unknown unknowns”, we need to introduce the notion of preconcept. In a given context, a preconcept is a partially defined concept, i.e. its list of objects may include more attributes than those currently indicated, or its list of attributes may correspond to additional objects.

…THEN “INSIDE THE BOX”

Now, starting from the end context, we eliminate all the relationships already specified in the existing context. The result is a third context, which includes only currently unknown relationships involving existing objects or objects we don’t know we don’t know, as well as existing attributes or attributes we don’t know we don’t know; let’s call this the discovery context. Firstly, we can state that no preconcept in the existing context can be a preconcept in the discovery context. This encourages us to look beyond the existing context, in other words to think outside the box. How do we go about it? Let’s consider the complementary existing context (CEC), i.e. the context in which all known relationships between existing objects and attributes are reversed: an attribute is linked to an object in the CEC only if it is not in the existing context. The CEC is precisely the area outside the box that we need to focus on. A fundamental proposition is that, if the discovery context contains preconcepts made up of existing objects and attributes, then these preconcepts must also be preconcepts in the CEC. Therefore, an initial approach to looking for clues about concepts we don’t know we don’t know in the discovery context is to methodically examine the preconcepts in the CEC.

Once this search is complete, so that there are no more relationships (that we knew we didn’t know) to be discovered between the objects and attributes existing among the preconcepts of the CEC, we must think “inside the box” again. This box then becomes the existing initial context, updated by adding any new relationships found between existing objects and attributes. Another important result is that projections of concepts we don’t know we don’t know from the end context into the updated existing context must be pre-concepts in it. Additional clues about the end context can now be obtained by examining the preconcepts in the updated existing context.

This process will not reveal which objects, attributes or “unknown unknowns” relationships will come to light. The important thing is that, whatever the final outcome, the structure of the shared context offers ways of reducing uncertainty by systematically exploring the available data.