IJCAI Session Notes: Verifying Agent Based Autonomous Systems

For the past couple of days I’ve been at IJCAI - the International Joint Conference on Artificial Intelligence. It’s been a three-day whirlwind of workshops and tutorials. I took a lot of notes, and as with ECAL last year I figured organizing them into blog posts would be a good way to review some of the takeaways. Note that the intended audience here is future-me, and I could very well have made mistakes or misinterpretations in my own notes. Also apologies about any typos.

One of my favorite sessions was Verifying Agent Based Autonomous Systems, by Louise Dennis and Michael Fisher, so I will start with this. I should note that there were many great, hands-on examples during this tutorial which the notes really do not do justice. We saw real life examples of Gwendolen agents, BDI syntax overviews, examples of standard model checking and program model checking, etc.

My brain kept coming back to drawing parallels between the results of model checking and the potential results of certain kinds of testing. I had a brief discussion with the speakers about this checking during the break, which was really interesting.

Verifying Agent Based Autonomous Systems

Tutorial by Louise Dennis, Lecturer and Senior Research Fellow in the Autonomy and Verification Laboratory at the University of Liverpool, and Michael Fisher, Professor of Computer Science and Director of the multi-disciplinary Centre for Autonomous Systems Technology at the University of Liverpool.

We start off by defining Autonomy as the ability of a system to make its own decision and to act on its own and do both without direct human interaction.

Degrees of autonomy

However, there are variations and levels in decision making:

Systems have varying degrees of autonomy. For example, in the aerospace industry the levels below are often used:

Full autonomy is particularly important when:

Concerns with full autonomy:

Autonomous System Architectures

Many architectures are available: sense-plan-act; subsumption; hybrid (what we will focus on in this tutorial); three-layer architectures.

What is a rational agent?

A basic ‘agent’ concept is not enough; we want a rational agent to not only make autonomous decisions, but explain why it made those decisions. Rational agents must have explicit reasons for making the choices they do and should be able to explain if necessary. They are often programmed and analyzed by describing their goals and knowledge/beliefs, and how these change over time.

Hybrid Agent Architectures

The requirement for self-awareness and reasoned decisions and explanations has led on to hybrid agent architectures. Combinging a Rational Agent for high level activities and Traditional feedback control systems for low level activities.

For example: as an agent we do not care about how to walk, only that we want to walk somewhere. How to move is captured in lower level control systems. The rational agent is conscious mind, a discrete component. The feedback control is a bunch of systems interacting with the environment.

A rational agent has access to numerous components including control systems, sensors, planners, learning systems, etc, and may be able to swap between them.

Verifiable Autonomy:

We should be certain of what the autonomous system intends to do and how it chooses to do it. A rational agent must have explicit reasons for making the choices it does, and should be able to explain them if needed.

We don’t care how good a robot’s gripper is, we only care about what it plans to do with it/its intentions.

Example, from pilot to rational agent: autopilot can fly an aircraft to keep on particular path, keeping the plane level/steady under environmental conditions, planning routes around obstacles, etc. But the human pilot makes high level decisions like where to go, when to change route, what to do in an emergency, etc. The human pilot is replaced with a rational agent which now makes the decisions that the human pilot used to make.

In remote or dangerous envs it is impractical or costly to have direct human control so autonomy is increasingly being built into the controlling software; eg, autonomous choices are an important element in many aerospce applications, such as cooperative formation flying satellites or unmanned air vehicles.

Systems that decide for themselves what to do and do it are closer than we think. Robotic assistants are being designed to help the elderly or incapacitated; we need to be able to verify that we can trust these robots' decisions.

Where do requirements come from?

You want a robot to “be safe”, but what does “safe” mean? “Don’t crush people”? “Do what person says if they’re awake”? It can be difficult to come up with proper requirements. Requirements can be based on safety, preferences, ethics, regulations, security, etc. Formal requirements are typically modal, temporal, probabilistic logics. We need formal requirements to be able to do formal verification, not just textual requirements.

So where do we go now?

How do we program a rational agent?

Beliefs-Desires-Intentions (BDI) is a class of languages we use to program rational agents, derived from logic programming.

Logic programming comes from ideas of automated reasoning. If you have some statement and some other statement which both imply something, then you can combine them together to make a new implication. This is converted in prolog into a program which tries to deduce these implications.

Although Prolog is doing logical deduction, it also tries to explore all ways that you can deduce something to see if one of them will work. You state what the problem is and the system underneath solves the problem. You can execute these prolog/logical programming programs to ask questions and find answers.

There are many BDI programming languages; general concepts:

Gwendolen reasoning cycle

Gwendolen is a BDI language. Its reasoning cycle looks as follows:

The order in which you write down the plan is the order in which it is applied.

In Gwendolen there is a distinction between achievement goals and performance goals. Achievement goals hang around until you do them; with performance goals you just perform the goal and don’t bother checking if anything’s been achieved at the end.

Adding wait instruction to Gwendolen became important since in autonomous systems actions are not instant.

Logics of belief in BDI languages

One interesting thing about BDI languages is that people keep trying to explore more interesting logics of belief and then fall back on Prolog - a belief is either true or not true, there is no uncertainty or undefined state. More interesting logics of belief turn out to be much more difficult to reason about.

BDI langauges came about from dissatisfaction with planning 20-30 years ago; they felt planning was too slow. Planning has come a long way so there’s quite a lot of interest in not having a set of plans supplied by the programmer, and instead just having a set of goals which you send to an external planner and which sends back a custom plan. External classical AI planners are a direction these BDI languages are moving in.

Verification Model Checking

Verification of a system is the act of establishing that the system conforms to a set of specified requirements.

Testing vs Formal Verification

Testing selects a subset of potential executions and checks if they match your requirements. However, a subset may not be enough and only covers a small amount of variations.

Formal verification assesses all runs and looks at all possibilities. For each possibility we check if the requirements we have are met.

[Question/Note: if important we can make a testable system that goes through all possible executions as well. The confidence achieved through formal verification just sounds like something we could get to through good, exhaustive testing, no? Note: not “good is exhaustive”, but “good and exhaustive”. If in a system it is critical to go through all possible executions, I would think tests can accommodate.]

Formal verification techniques:

Proof: where the behaviour of a system is described by the logical formula, S, and verification involves proving that S implies R. Typically this requires automated deductive methods to cope with combinations of logics.

Standard model checking

We want to check that our system satisfies some requirement R. All executions of our system must be a subset of all executions that satisfy R. If not, the system does not satisfy R. We have finite automata which capture all executions of our system; one finite automaton represents all executions of our system, the other represents all executions not satisfying R. The system must not satisfy an unsafe thing or the negation of a requirement. The automaton representing possible executions must not intersect the one representing bad executions.

So, model checking works by taking a system and taking the negation of R-satisfying runs, and checking that the system does not intersect the negation of R-satisfying runs.

The above is how standard model checking works.

Standard model checking can be very big and slow. More refined model checkers don’t take the product of the two automatons, they take the two automata separately and build the product on the fly. You keep a model of the system and a model of “bad” paths separate, and synchronously explore the steps. You go through each execution edge and compare it to the bad step.

Program model checking

In program model checking, we do not build a model of the system, we just keep the program. Then when we do the exploration we just do symbolic execution - run the code. [A question was asked about what “symbolic execution” means in this context. It just sounds like testing-esque execution - execution of the program in a non-live environment.] The result is slower but smaller than model checking as we do not build two automatons or the product of those automatons.

We still have a “bad” automaton. We step through the symbolic execution in the program code and every step we take we try to match with a step in the bad automaton. To symbolically execute program code you need some way to extract all possible executions of the program, a VM that can backtrack. So we need:

Agent Java PathFinder

Java PathFinder is a model checker for Java programs. Every time you take a step, a listener process decides if the step is allowed or not. If it is allowed, backtrack and take another step.

The speakers developed a plugin for JPF called Agent Java PathFinder (AJPF). AJPF is a toolkit for building BDI interpreters.

Verification in autonomous systems

What we are verifying is what the system is choosing to do given its beliefs, not what effect the autonomous system has on the world (eg whether kills someone or crashes)

We verify the rational agent, not the feedback control part. You can’t really do formal verification on the control part, that is down to testing.

Once we have an autonomous system based on rational agents and logical requirements, typically we do lots of testing of the feedback control components, formal verification of the rational agent, and simulation of the whole system.

Eg: in a robotic assistant you typically want to verify “robot will always try to wake the human when it believes there is a fire”. The sensor might be wrong, but if the sensor is not wrong and alerts the rational agent that there’s a fire, it will always try to wake the human. We can verify that the robot always intends/tries to do the right thing when there is a fire. We can’t guarantee that it will succeed or that it will correctly know that there is a fire.

Machine Learning for autonomous agents

A question came up about machine learning for autonomous agents, and verifying them. The speakers expressed that you can’t trust machine learning or any adaptive system when it comes to autonomous agents. In autonomous systems the rational agent must never rely completely on machine learning or an adaptive algorithm unless your machine learning system can generate something in a way that looks like rules. You would need some sort of Asimov-like hard constraint on the learning. Machine learning is a fascinating area, but with massive challenges.

Verifying machine ethics

One of the examples shown earlier was an autonomous agent for flying an aircraft. The agent was created to follow the same “rules of the air” that a human pilot would. However, when the creators approached aviation experts about this kind of systems, the first question was “So what about when you don’t want to follow the rules of the air”?

The aviation officials “had an underlying assumption that the human pilot will always act in whatever way is necessary to preserve life”. The human pilot may not blindly follow the rules of the air. So how do you decide when it is right to break the rules? Potential answer: you get plans from an external planning system which has fewer constraints. An ethical planner.

Ethical policies

Have a set of ethical concerns which we rank. For example, killing is worse than stealing is worse than lying.

A plan p1 is worse than another plan p2 if:

Issues with program model checking:


Program model checking is significantly slower than standard model checking applied to models of the program execution, because program model checking actually executes the program. “Random” environments provide large state space. Thus, verifications in AJPF take minutes and hours rather than seconds with some other standard model checking tools. Improvements are ongoing, but it will never be as fast as standard model checking.


Choosing the best logic to use to describe our requirements can be difficult. Have we asked the right questions? Are some properties too difficult to describe? Some specifications require complex combinations - we do not yet have full verification systems for these.

Conflicting requirements

In social robotics, trust is key. Disobeying orders will erode trust. Privacy violations will erode trust. Reconciling tension between conflicting requirements can be difficult. What if a person orders a robot to do something that the robot thinks will be detrimental to the person, or illegal? How do you write down those requirements? Whose responsibility is it to decide? This is not a technical problem, but a social or regulatory problem in working out what the requirements are.


Currently no standards really mention autonomous agents.

Related to standards, regulations are required. Legalities of possible situations are not worked out.

Absractions and assumptions

To carry out formal verification we need to make some assumptions/abstractions about the environment. Whenever these assumptions hold, the verified behaviour will occur. We need to recognize when the environment falls outside of these boundaries. Work is currently under way on this.

© - 2021 · Liza Shulyayeva ·