io_uring with Rust

Prerequisites

To understand this article, you should be familiar with:

Basic knowledge of operating systems.
How Future works in Rust.
The state machine generation behind async and await.
How spawn and block_on work in Rust asynchronous runtimes.
File descriptors in Unix-like systems.
The workings of select, poll and epoll in Linux.

What is `io_uring`?

In Linux, system calls like read are typically used to interact with files, sockets and other I/O devices. These system calls are synchronous, which means the calling thread is blocked until the operation completes. While this design simplifies programming models, it can lead to inefficiencies, especially when dealing with high-performance or latency-sensitive applications. In such cases, we often need mechanisms that allow the calling thread to continue executing while the I/O operation is in progress. This need gave rise to asynchronous I/O mechanisms.

Initially, mechanisms like select, poll and epoll were introduced to address this issue. These mechanisms notify the user space when a file descriptor becomes ready for I/O, but the actual reading or writing still requires synchronous system calls after receiving the notification. Although they reduce idle wait times, they don’t completely eliminate the overhead associated with synchronous calls.

This is where io_uring stands out. io_uring is a high-performance asynchronous I/O framework that enables system-call-like operations to be executed without blocking the calling thread. By leveraging io_uring, applications can achieve greater efficiency and scalability.

Here are some key features of io_uring:

Reduced User-Kernel Transitions: Unlike traditional system calls, io_uring minimizes user-kernel context switching, which significantly improves performance by reducing overhead.
True Asynchrony: Operations initiated via io_uring do not block the initiating thread. While the operation executes in the background, the thread can continue performing other tasks, which is especially beneficial in multi-threaded or event-driven systems.
Batching Capability: io_uring allows multiple operations to be submitted together in a batch, reducing the cost of frequent submissions and improving throughput.
System-call-like Operations: The io_uring API closely resembles traditional system calls, making it easier to integrate with existing codebases.

How `io_uring` Works

io_uring Workflow

The io_uring framework is built around two primary ring buffers managed by the kernel:

Submission Queue (SQ): This is where user space enqueues operations for the kernel to execute. It holds Submission Queue Entries (SQEs) that describe the requested operations.
Completion Queue (CQ): This is where the kernel posts the results of completed operations. It contains Completion Queue Entries (CQEs) with details of the operation’s outcome.

These queues follow a single-producer, single-consumer model, which ensures efficient data transfer between user and kernel spaces.

Here’s step-by-step how io_uring works:

In User Space:
1. The application begins by creating and registering a ring buffer with the kernel. This ring buffer contains both the Submission Queue and Completion Queue.
2. The user constructs a Submission Queue Entry (SQE) that describes the operation to perform. This could be a file read, write, or another supported operation.
3. The constructed SQE is then pushed to the Submission Queue.
4. The user updates the tail pointer of the Submission Queue to notify the kernel about the new entry.
5. The application waits for results in the Completion Queue. This waiting can be active or passive depending on the application’s design.
6. Once the operation completes, the application retrieves the corresponding Completion Queue Entry (CQE) and processes the result.
In Kernel Space:
1. The kernel retrieves the SQE from the Submission Queue and validates it.
2. The kernel executes the operation described by the SQE. This could involve reading data, writing to a file, or another supported action.
3. After completing the operation, the kernel posts a CQE with the result to the Completion Queue, allowing the user space to process the outcome.

By avoiding unnecessary system calls and reducing the need for user-kernel transitions, io_uring achieves remarkable efficiency.

Using `io_uring` with Rust

The io-uring crate provides a Rust-friendly interface for leveraging io_uring. Below is an official example of using it to perform asynchronous file reads:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
use io_uring::{opcode, types, IoUring};
use std::os::unix::io::AsRawFd;
use std::{fs, io};

fn main() -> io::Result<()> {
    let mut ring = IoUring::new(8)?;

    let fd = fs::File::open("README.md")?;
    let mut buf = vec![0; 1024];

    let read_e = opcode::Read::new(types::Fd(fd.as_raw_fd()), buf.as_mut_ptr(), buf.len() as _)
        .build()
        .user_data(0x42);

    // Note that the developer needs to ensure
    // that the entry pushed into submission queue is valid (e.g. fd, buffer).
    unsafe {
        ring.submission()
            .push(&read_e)
            .expect("submission queue is full");
    }

    ring.submit_and_wait(1)?;

    let cqe = ring.completion().next().expect("completion queue is empty");

    assert_eq!(cqe.user_data(), 0x42);
    assert!(cqe.result() >= 0, "read error: {}", cqe.result());

    Ok(())
}

Here are the key points to note in this example:

The read_e operation mimics the read system call but is executed asynchronously by the kernel.
The user_data field allows you to tag operations, making it easier to associate CQEs with their corresponding SQEs.
Ensure safety and validity when interacting with the submission queue, as invalid entries can lead to runtime errors. For instance, attempting to read from an invalid file descriptor would result in an error being posted to the Completion Queue.

This example demonstrates how io_uring enables non-blocking file reads with minimal overhead, making it an excellent choice for performance-critical applications.

`io_uring` in Rust’s Asynchronous Runtime

Using io_uring directly in an application can be cumbersome. Instead, it’s often integrated into a runtime to simplify asynchronous I/O operations.

Actually, I am developing a runtime based on io_uring in Rust, here is how it works conceptually:

Polling `Future`s

When polling a Future:

The first poll creates an SQE and pushes it to the Submission Queue. At this point, the operation is initiated but not yet complete.
The poll function returns Poll::Pending, indicating that the result is not yet available.
The executor continues polling the Future. It keeps returning Poll::Pending until the corresponding CQE is available in the Completion Queue.
Once the operation completes, the Future retrieves the result from the CQE and returns Poll::Ready with the operation’s outcome.

Tracking Execution State

Here is a simple way but not best practice to track the execution state of an operation:

A shared memory structure is used to store operation results. This memory is accessible by both the executor and the Future.
Each operation is tagged with a unique identifier using the user_data field of the SQE. This allows the CQE to be matched with its corresponding operation.
When a CQE is retrieved, the shared memory is updated with the result, enabling the Future to access the completed operation’s data.

Submission Strategies

When should SQEs be submitted to the kernel?

Immediate Submission: Submit the SQE as soon as it is pushed to the queue. This reduces latency because the operation is sent to the kernel without delay. However, it increases CPU usage due to frequent submissions.
Deferred Submission: Submit SQEs in batches when the executor is idle or when a threshold is reached. This strategy reduces the cost of frequent submissions, improving overall efficiency. However, it may introduce slight delays in processing.

For high-performance applications, the second strategy is generally preferred as it strikes a balance between latency and resource utilization.

Conclusion

io_uring is a game-changer for asynchronous I/O in Linux. By understanding its mechanisms and integrating it with Rust’s async ecosystem, you can build efficient and scalable applications. Whether you are developing a runtime or simply optimizing I/O operations, io_uring provides the tools needed to achieve cutting-edge performance.

Prerequisites

What is io_uring?

How io_uring Works

Using io_uring with Rust

io_uring in Rust’s Asynchronous Runtime

Polling Futures