Sharding UUIDv7 (and UUID v3, v4, and v5) values with one function

A rust function to generate variable-length shard keys for UUID values

UUIDv7 (wiki link) is seeing strong and eager adoption as a solution to problems that have long plagued the tech industry, providing a solution for generating collision-free IDs on the backend that could still be sorted chronologically to play nicer with database indexes and other needs.1 As a quick briefer, a UUIDv7 is essentially composed of two parts: a timestamp half and a randomized bytes half, and they’re sorted by the timestamp:

Let’s say you want to shard based off a UUIDv7 key: maybe to assign data across a horizontally-distributed database, or to avoid having a couple of million files in the same directory. You can’t use the timestamp portion to derive a shard key, as not even the microseconds are necessarily uniformly distributed, so you need to shard based off the random portion of the UUID.

Lucky for us, the last eight bytes of the UUID contain random data not just in the case of a UUIDv7 – those bytes also contain random data in the case of UUID v3 (MD5-hashed), UUID v4 (randomized), and UUID v5 (like v3 but using SHA-1), letting us use one approach to hash all four of these UUID types.

With some judicious use of the stabilized subset of const generics, we can generate a shard key N-chars long without any allocation, and provide a wrapper ShardKey type to simplify both creating the shard key and passing it around in a way that lets it decompose to a &str in a way that wouldn’t have been possible if we had returned a [char; N] instead.2

/// Derives a deterministic N-char sharding key from a UUID.
///
/// It extracts entropy starting from the last byte (the most random 
/// part of a UUIDv7) moving backwards, converting nibbles to 
/// hexadecimal characters. It uses only the last 8 bytes (the `rand_b`
/// section) to ensure the timestamp (first 6 bytes) is never touched, 
/// preventing "hot spot" sharding issues based on time.
///
/// # Panics
/// * Panics if the provided UUID is not shardable based off the last 
///   8 bytes (i.e. if the UUID is not one a v3, v4, v5, or v7 UUID).
/// * Panics if the requested shard key length `N` exceeds the 
///   available entropy.
pub fn shard<const N: usize>(uuid: &uuid::Uuid) -> ShardKey<N> {
    // Ensure UUID random bytes assumption
    if !matches!(uuid.get_version_num(), 3 | 4 | 5 | 7) {
        panic!("Provided UUID cannot be sharded with this interface!");
    }
    // Validate shard key length
    if N > 15 {
        panic!("Requested shard key length exceeds available entropy!");
    }

    const HEX: &[u8; 16] = b"0123456789abcdef";
    let bytes = uuid.as_bytes();
    let mut result = [b'\0'; N];

    // Generate N-char shard key
    for i in 0..N {
        // We extract data from the tail (byte 15) backwards to byte 8.
        // Bytes 8-15 in UUIDv7 contain 62 bits of randomness + 2 bits 
        // to indicate variant. We avoid bytes 0-7 (Timestamp + Version) 
        // to ensure uniform distribution.

        // Calculate byte index, consuming two nibbles per byte.
        let inverse_offset = i / 2;
        let byte_index = 15 - inverse_offset;

        let byte = bytes[byte_index];

        // Even indices take the low nibble; odd indices take the high nibble.
        let nibble = if i % 2 == 0 {
            byte & 0x0F
        } else {
            (byte >> 4) & 0x0F
        };

        result[i] = HEX[nibble as usize];
    }

    ShardKey { key: result }
}

#[repr(transparent)]
#[derive(Copy, Clone, PartialOrd, Ord, PartialEq, Eq, Hash)]
pub struct ShardKey<const N: usize> {
    key: [u8; N],
}

impl<const N: usize> ShardKey<N> {
    pub fn new(uuid: &uuid::Uuid) -> Self {
        shard(uuid)
    }

    pub fn as_str(&self) -> &str {
        // Safe because only we have access to this field and we always
        // initialize it fully with ASCII-only characters.
        unsafe { std::str::from_utf8_unchecked(&self.key) }
    }
}

The shard() function can be used standalone, or the ShardKey wrapper type can be created from a reference to a &Uuid and then passed around until the point where you need to retrieve the key with a quick .as_str() call. As promised, the length of the shard key itself is variable and parameterized, by providing different values for N you can generate shard keys from one to fifteen (or sixteen, if you prefer) characters long (past which point the available uniformly-distributed entropy has been exhausted and the function will panic to ensure contract is upheld).


  1. Of course, non-standardized solutions abound and UUIDv7 itself takes a lot of inspiration from predecessors like Ulid and others. 

  2. As rust is famously one of the very few languages that uses different-sized units/types to represent a standalone char versus the smallest type used to compose a String; in rust, char is 4-bytes long and has UTF-32 encoding while a String or &str uses an underlying 1-byte u8 type and are UTF8-encoded. This means you can’t go from a String to a [char] without allocating, nor go from a [N; char] to an &str without looping and allocating, either. 

Leave a Reply

Your email address will not be published. Required fields are marked *