Ball Balancing Table Maze Solver - Reinforcement Learning

The Ball Balancing Table (BBT) is a great place to start if you want to learn and gain experience with control theory firsthand. The BBT consolidates high-grade accuracy with open-source accessibility, providing students, engineers and researchers with a ecosystem to test and improve control algorithms.

This blog is about a project where we use The Ball Balancing Table to make a PID controller and Q-Learning so that it can solve a maze. In this post, we'll be looking at the underlying principles of both the hardware and algorithm, how maze is encoded into a matrix and what kind of real-time tweaks could break beyond current-systems capabilities. Let’s dive in!

‍

Introduction to the Ball Balancing Table (BBT)

‍

The Ball Balancing Table (Figure 1) is a classic experiment in control systems that bridges industrial processes and DIY projects. It consists of a flat surface (the table) where a ball is placed, and the objective is to control the tilt of the table to guide the ball to specific locations. Students can learn essential control concepts, such as feedback systems, by experimenting with different types of controllers, such as PID, and adaptive control. The open-source software integration allows users to modify and test advanced control algorithms, making it a versatile tool for both academic and real-world applications.

‍

‍

Control System Design

‍

In this project, we employ a PID controller (Proportional-Integral-Derivative), one of the most used control mechanisms in automation systems. The PID controller helps maintain the desired trajectory by adjusting the angles of the BBT's platform based on feedback from the ball's position. Here’s how it works:

Proportional (P): Reacts to the current error between the target and the actual position of the ball.
Integral (I): Accounts for accumulated past errors to reduce steady-state error.
Derivative (D): Predicts future error based on the current rate of change.

Together, these terms allow the table to adjust its tilt dynamically, keeping the ball within a defined path, and ultimately solving the maze.

‍

Encoding the Maze

‍

The maze is represented as a matrix of 0s and 1s in Python, where:

0 represents open spaces.
1 represents walls or obstacles.

This matrix forms the environment within which the ball must move. The goal is to guide the ball from the start position to the maze's exit. Here’s an example matrix representation:

maze = [

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],

[1, 0, 0, 0, 1, 0, 0, 0, 0, 1],

[1, 0, 1, 0, 1, 0, 1, 1, 0, 1],

[1, 0, 1, 0, 0, 0, 0, 1, 0, 1],

[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

]

‍

The maze to be encoded is illustrated in the following figure.

‍

‍

This image is converted to a matrix using image processing. In the following step obtained grid is mapped to the coordinates of the BBT. As the ball moves, the system interprets the matrix and sends the ball to the corresponding open spaces.

‍

Reinforcement Learning and Q-Learning

‍

Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may affect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These two characteristics—trial-and-error search and delayed reward—are the two most important distinguishing features of reinforcement learning.

One of the most common RL algorithms in areas like shortest path is Q-learning. Here’s how it works in the context of solving the maze:

The ball (agent) is placed at the start of the maze.
For each step, the agent decides which direction to move (up, down, left, or right).
If the move leads to a valid position, it gets a reward; if it hits a wall, it receives a penalty.
Over time, the agent learns which actions maximize the cumulative reward, allowing it to find the optimal path through the maze.

The Q-learning algorithm uses the following equation to update the "quality" (Q-value) of each state-action pair:

‍

Q(state,action)=Q(state,action)+alpha (reward+gamma max(Q(newstate,allactions) )−Q(state,action))

‍

Where:

alpha is the learning rate, determining how much new information overrides old information.
gamma is the discount factor, which determines the importance of future rewards.
reward is the feedback received from the environment after performing an action.

In our project, Q-learning enables the ball to learn and solve the maze. Once the agent (the ball) has learned the optimal path, it starts sending commands to the BBT to follow the learned trajectory.

The solution to the maze (figure 2) is illustrated in the figure below, showcasing the optimal path determined by the Q-learning algorithm. This path represents the sequence of moves that successfully navigate the maze.

In addition, the heat map visualizes the learned Q-values for each state within the maze. The map highlights the desirability of each position based on the cumulative rewards, with warmer colors indicating higher Q-values and cooler colors representing lower values.

‍

Figure 3: Optimal Path Determined by the Algorithm Figure 4: Heat Map of the Solution

‍

Working Mechanism of the BBT Maze Solver

‍

Here’s how the maze-solving mechanism works:

Initialize the environment: The maze is encoded into a matrix, and the Q-learning agent is initialized.
Learn the path: The agent iterates through the maze, exploring different paths. Over time, it learns which paths lead to the goal (exiting the maze) and which result in dead ends.
Send instructions to the BBT: Once the path is learned, the coordinates of each step are converted to BBT coordinates using the function maze_to_bbt_coords().
Move the ball: The ball follows the learned path, controlled by the PID algorithm, which adjusts the tilt of the table based on real-time feedback from the ball’s position.

Here’s a simplified pseudocode snippet showing how the BBT receives commands to move to specific points:

#pseudocode starts here 
FUNCTION move_bbt_to_position(setpointx, setpointy):  positionx, positiony = GET current_ball_position()  # Retrieve the current position of the ball  errorx = setpointx - positionx  # Calculate the error in x-axis  errory = setpointy - positiony  # Calculate the error in y-axis 
 outputx = APPLY_PID_controller_x(errorx)  # Calculate new position for x using PID control  outputy = APPLY_PID_controller_y(errory)  # Calculate new position for y using PID control 
 SET_servo(outputx, outputy)  # Send servo commands to adjust BBT position 
    UPDATE_device()  # Update the BBT with the new position 
#End of the pseudocode

‍

The application source code is available in this Github repository.

The image bellow shows the Ball Balancing Table (BBT) in conjunction with the maze.

‍

‍

In the video below, a trial example outcome of the BBT maze solver with the reinforcement learning algorithm can be seen. Please note, the video is accelerated by 10x, because the RL algorithm is not optimized for speed in this example.

‍

Real-World Applications Examples

‍

The BBT maze solver can be seen as a scaled-down simulation of complex industrial control systems, offering several potential applications:

Robotics: Autonomous navigation systems, like those used in robotic vacuum cleaners, could employ similar algorithms to navigate around obstacles.
Game AI: The same principles can be applied in video games where non-player characters (NPCs) need to navigate complex environments.
Real-Time Traffic Management: In a future where AI drives vehicles, managing traffic could resemble solving a maze, with controllers needing to adapt dynamically to real-time conditions, much like the adjustments made in the BBT Maze Solver.

‍

Future Improvements and Directions

‍

This project offers several avenues for further development:

Real-Time Maze Recalculation: By adding a camera to the top of the BBT, the system could take snapshots of the maze and dynamically adjust the path if obstacles are moved or removed in real-time.
Adaptive Control Algorithms: Implementing more advanced control algorithms, like Autonomous PID tuning, controllers itself can continuously adapt their parameters with RL, allowing the system to automatically fine-tune its response to environmental changes and disturbances.
Deep Reinforcement Learning: Transitioning from Q-learning to deep reinforcement learning (using neural networks) could enable the system to solve more complex mazes with greater accuracy and flexibility such as moving in diagonals.

Conclusion

‍

The Ball Balancing Table and Q-learning provide an exciting mix of hardware and software where classic control theory meets cutting-edge machine learning techniques. Through projects like this, we can deepen our understanding of control systems, reinforcement learning, and their potential real- world applications. With continuous improvements, these algorithms can drive self-regulating traffic networks, control autonomous robots, and advance the development of intelligent gaming systems.

By exploring these concepts and implementing them in hands-on projects, we unlock new opportunities for innovation and understanding. Whether it is for a student learning control theory or a researcher experimenting with advanced machine learning algorithms, the BBT offers a fantastic platform to bring these ideas to life.

‍

References

Acrome. Ball Balancing Table.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction. MIT Press.
K. J. Astrom, & T. Hagglund. (2006). Advanced PID Control. ISA - Instrumentation, Systems, and Automation Society.

‍

Author

İsmail Özgenç

Intern Engineer