YOLOv9: A revolutionary breakthrough in object detection

July 23, 2024

Artificial intelligence

Technology never stops and advances at the speed of light, and the YOLO models are living proof of it. Here, we’ll talk about YOLOv9. This object detection model has taken a leap forward in computer vision, combining speed with precision. Thanks to improvements such as dynamic box anchoring and enhanced non-maximum suppression, YOLOv9 can analyse images with incredible accuracy.

Previous versions of YOLO have marked significant milestones in the evolution of object detection. Ultralytics has been constantly evolving, working on new architectures and speed optimisations, as well as introducing pre-trained models to make the system even more versatile. YOLOv8 in 2023 introduced a unified and scalable model, while YOLOv9 in 2024 redefines object detection with unmatched precision and speed. YOLOv10, also released in 2024, represents another major leap in performance and efficiency.

In this article, we’ll dive into the fascinating world of YOLOv9. We’ll explore the benefits of this advanced object detection model, examine its architecture, and analyse both its advantages and challenges. Additionally, we’ll take a look at the future of this model in the field of artificial intelligence.

Network structure: How YOLOv9 builds its intelligence

Convolutional layers: In YOLOv9, convolutional layers are the fundamental building blocks for processing images. These layers apply filters that extract essential features such as edges and textures. Imagine each filter as a magnifying glass examining different aspects of the image. The output of these layers is a set of feature maps that represent the detected information within the image.

Pooling: Pooling is a technique used to reduce the dimensionality of feature maps while retaining only the most relevant information. There are different types of pooling, such as max pooling and average pooling. Max pooling, for instance, selects the highest value within a window to represent a specific area of the image. This simplifies the data and reduces computational time without losing crucial information needed for object detection.

Fully Connected Layers: Fully connected layers are found at the end of the model and are responsible for integrating all the information processed by the previous layers. These layers combine the extracted features to make final decisions about the image content. Each neuron in a fully connected layer is linked to all neurons in the previous layer, enabling complete integration of the information.

YOLOv9: Improvements over previous versions

This model represents a quantum leap in real-time object detection, redefining the standards of performance, accuracy, and efficiency. Below, we highlight the innovations and optimisations that set it apart from its predecessors.

Fundamentals of YOLOv9

Neck: FPN and PAN: Imagine you are trying to view an object from different levels of a staircase, from the base to the top. This is where the Feature Pyramid Network (FPN) comes into play—it acts like a magic staircase that allows the model to see the object from different heights or, more precisely, at different scales. This means it can detect both large and small objects with the same ease by combining information from different levels of the network.

Now, add Path Aggregation Network (PAN) to the mix, which works like a context specialist. PAN gathers all the information from the deeper layers of the network and integrates it into the images, improving accuracy by understanding what is happening around the object. Together, FPN and PAN help YOLOv9 achieve a clearer and more complete understanding of the scene.

Single-stage Head: Unlike some previous models that required multiple stages for detection, YOLOv9 uses a single-stage head. This unique head enables the model to make predictions about bounding boxes and object class probabilities quickly and efficiently. Think of it as a shortcut that saves time while maintaining accuracy without adding unnecessary complexity.

Key components of YOLOv9

Privileged Memory (CSPNet): YOLOv9 utilises CSPNet to efficiently manage important information in an image. CSPNet splits the feature extraction process into two parallel paths—one dedicated to extracting detailed information and the other capturing more general insights. This enables the model to retain crucial details without overloading the system, allowing for precise detection even in low-quality images or challenging conditions.

Fast Computation (Bottleneck): This technique in YOLOv9 enhances image processing speed by reducing computational complexity and using more efficient layers, allowing for faster calculations. Thanks to this, YOLOv9 can detect objects in real time, which is essential for applications requiring instant responses, such as robotics and autonomous vehicles.

Programmable Gradual Information (PGI): PGI is an innovative technique that ensures important data is preserved throughout the deeper layers of the neural network. This prevents the loss of crucial information and allows the model to generate more reliable gradients during training, resulting in more accurate and efficient object detection.

Generalised Efficient Layer Aggregation Network (GELAN): GELAN optimises how extracted features are combined and utilised across the network. It allows for the flexible integration of different computational blocks, improving efficiency without compromising speed. This makes YOLOv9 highly adaptable to various applications and devices, maximising both performance and accuracy.

Reversible Functions: YOLOv9 employs reversible functions to maintain the integrity of information as it passes through different layers of the model. This ensures that essential data is not lost and that the model can be updated more accurately, thereby improving object detection quality.

Object detection models: Who wins? spoiler alert – YOLO!

comparativa YOLOv9 1

When it comes to choosing the best object detection model, it’s like selecting the perfect ally for your team. Here’s a comparison to show you why YOLO models are the top choice in this story, especially against other popular models like Faster R-CNN and RetinaNet.

Inference speed

YOLOv8, YOLOv9, and YOLOv10 models are incredibly fast. They are designed to deliver real-time results, making them perfect for applications where speed is crucial.

On the other hand, Faster R-CNN is highly accurate but takes its time. It’s a meticulous model that analyses every detail, which isn’t always ideal if you need quick responses.

Finally, RetinaNet strikes a balance between speed and accuracy, but it’s still not as fast as YOLO models.

Accuracy

With each new version, YOLO models become more precise, incorporating innovative technologies that enhance accuracy without sacrificing speed. Meanwhile, Faster R-CNN is extremely accurate, especially in optimised settings, but sometimes at the cost of speed. RetinaNet introduced focal loss to improve accuracy on imbalanced datasets, but it still lags behind the latest YOLO versions, such as YOLOv9 and the newest YOLOv10.

Computational efficiency

YOLO models are undoubtedly the kings of efficiency, using fewer computational resources while delivering outstanding results. This makes them ideal for a wide range of hardware.

Faster R-CNN, on the other hand, requires more processing power due to its complex architecture, which can be a drawback if you don’t have a high-end machine. RetinaNet is more efficient than Faster R-CNN but still not as optimised as the latest YOLO models.

Architecture & Flexibility

YOLO models: Feature modular and flexible architectures that easily adapt to different tasks and scales.
Faster R-CNN: More rigid and complex, making it harder to adapt to specific applications.
RetinaNet: Simpler than Faster R-CNN but still not as flexible as the latest YOLO models.

Hardware requirements

YOLO models are hardware-friendly, meaning you don’t need the most expensive setup to achieve great results.
Faster R-CNN requires high-end GPUs to perform well, which can be a limitation.
RetinaNet is better than Faster R-CNN in terms of hardware requirements, but it may still need a powerful setup for real-time applications.

If you’re looking for a combination of speed, accuracy, and efficiency, YOLO models deliver. While other models have their own strengths, they cannot match the unique blend of advantages offered by the latest YOLO versions, making them the best choice for object detection today.

Predictions and possibilities of YOLOv9

YOLO models for object detection are making a significant leap forward with their innovative improvements in efficiency and accuracy. The YOLOv9 version introduces groundbreaking techniques such as Programmable Gradient Information (PGI) and the Generalised Efficient Layer Aggregation Network (GELAN), which address bottlenecks in information flow and gradient reliability that affected previous versions.

Looking ahead, it is an exciting challenge to consider the possibilities and directions these models may take. They are expected to strive for greater accuracy and generalisation, further refining their architecture to handle a broader range of challenging datasets. Additionally, real-time inference speed remains crucial. In this context, advanced techniques are likely to be employed to enhance precision and reduce model size, thereby optimising performance on specific hardware.

Furthermore, YOLOv9’s ability to tackle complex environments, such as challenging lighting conditions and occlusions, will be another key area of focus. Research will also aim to integrate YOLOv9 with other computer vision tasks, such as segmentation and object tracking, to create more comprehensive workflows. Additionally, optimisation for mobile and embedded devices will further expand its applicability.

Reflection

reflexion yolov9 1

Although YOLOv9 marked a significant breakthrough in object detection, technology and artificial intelligence continue to evolve, bringing us ever closer to more sophisticated models. Ultralytics has been shaping this progress since 2007, consistently impacting our world with its innovative models. This year, in just a three-month span, it surprised us with YOLOv9 and the revolutionary YOLOv10 the first object detection model without Non-Maximum Suppression (NMS).

It is exciting to consider how these advancements can transform business operations, creating new opportunities to optimise internal processes. We encourage you to explore and make the most of these models whether YOLOv8, YOLOv9, or YOLOv10 and to stay tuned for future developments that may surpass even these pioneers.

learn.more

Artificial intelligence

How Computer Vision Benefits Businesses (+ 3 Real-World Cases)

Artificial intelligence

Industrial Computer Vision: How Companies Are Implementing It

Artificial intelligence

YOLO11: Faster, more accurate, more versatile.

Branded

Adaptation, connection and innovation, the keys to the Microsoft partner community

Artificial intelligence

YOLOv9: A revolutionary breakthrough in object detection

Artificial intelligence

The future and benefits of AI for programming

Artificial intelligence

AI Applications in Software Development with SofIA

Artificial intelligence

Virtual assistants in businesses: Types and benefits