I’m happy to announce that a new version of size
, the PrettySize.NET port for rust, has been released and includes a number of highly requested features and improvements.
The last major release of the size
crate was 0.1.2, released in December of 2018. It was feature complete with regards to its original purpose: the (automatic) textual formatting of file sizes for human-readable printing/display purposes. It would automatically take a file size, pick the appropriate unit (KB, MB, GB, etc) to display the final result in, and choose a suitable precision for the floating numeric component. It had support for both base-10 (KB, MB, GB, etc) and base-2 (KiB, MiB, GiB, etc) types, and the user could choose between them as well as override how the unit was formatted. In short, it did one thing and did it right.
A brief recap of the size
crate to date
Some time after its release, there was a request made to add support for mathematical operations on strongly-typed Size
types (without having to go to an intermediate “raw” bytes value and back) that I originally approached with some gusto, but ended up dismayed by some restrictions of the rust type system that made it difficult to write generic code that could support the full gamut of what a user could reasonably expect to be able to do (more on this later).
As the size
crate covered the functionality we needed out of it here at NeoSmart Technologies and as there were valid workarounds for composing/calculating sizes (via the .bytes()
escape hatch), there wasn’t a pressing need to tackle those limitations and other projects took priority meaning size
didn’t see any updates since then. But it never sat well with me that I left my git working tree in an unclean state and had open issues languishing unresolved, and from time to time I would always think of going back and issuing an update… but you know how that goes.
However, in a recent discussion apropos “genuine limitations of the rust type system that frustrate people that otherwise love rust” I raised the issue that I ran into and that finally got me hot and bothered enough to finally tackle a new size
release with the requested support for mathematical operators and more.
Rust’s problem with commutative mathematical operations
So what’s this about a problem in rust’s type system? Well, with the caveat that it’s importance is almost certainly overinflated in my eyes, the issue lies with how mathematical operations (impl
s of core::ops::{Add, Sub, Mul, Div}
and others) are written and how that conflicts with rust’s orphan rule, which forbids you from implementing a trait for a foreign type (defined in a different crate).
After publishing, I realized that this article heavily uses the LHS and RHS abbreviations, but never defines what they stand for! LHS is “left-hand side” and RHS is “right-hand side” and they denote (for a binary mathematical operation) what side of the operator a token is on. e.g. in 42 apples + 17 oranges
, the LHS is “42 apples,” while the RHS has a magnitude of 17 with a unit of “oranges.”
Let’s take a look at how we would normally implement (in rust) support for a mathematical operation like foo * bar
where Foo
and Bar
are different types, both local to the current crate:
use core::ops::Mul;
struct Foo(i32);
struct Bar(i32);
/// Here, Bar is the RHS type and Foo is the LHS type,
/// i.e. this impl is used for `let prod = foo * bar`
impl Mul<Bar> for Foo {
type Output = i32;
fn mul(self, other: Bar) -> Self::Output {
self.0 * other.0
}
}
/// Here, Foo is the RHS type and Bar is the LHS type,
/// i.e. this impl is used for `let prod = bar * foo`
impl Mul<Foo> for Bar {
type Output = i32;
fn mul(self, other: Foo) -> Self::Output {
self.0 * other.0
}
}
fn main() {
println!("{}", Foo(7) * Bar(6));
println!("{}", Bar(6) * Foo(7));
}
You can try this online in the rust playground.
The above code demonstrates the commutative property of scalar multiplication: foo * bar
is the same as bar * foo
and returns the same result, both in type (here, i32
) and magnitude (42
in this example).
It’s important for a type system to allow (or even require) you to write out separate implementations for each of the two commutative permutations because not all mathematical operations (or even all multiplications) are commutative. For example, while addition is generally commutative, subtraction isn’t (4 - 2
gives a different result from 2 - 4
) – and multiplication of matrices M
and N
may not only give different results for M * N
as compared to N * M
, one of those operations may be valid while the other is an error!
So far, we haven’t run into any issues. But let’s say we have a bunch of different types, all of which (for reasons we won’t get into) can be boiled down to an integer equivalent and we want to support commutative multiplication for them all. It sounds like a textbook case for the use of generics: define a trait AsInt
, have each type implement it however it likes, then implement core::ops::Mul
via the AsInt
trait:
use core::ops::Mul;
trait AsInt {
fn as_int(&self) -> i32;
}
struct Foo(i32);
struct Bar(i32);
impl AsInt for Foo {
fn as_int(&self) -> i32 { self.0 }
}
impl AsInt for Bar {
fn as_int(&self) -> i32 { self.0 }
}
impl<Lhs: AsInt, Rhs: AsInt> Mul<Rhs> for Lhs {
type Output = i32;
fn mul(self, other: Rhs) -> Self::Output {
self.as_int() * other.as_int()
}
}
You can try this online in the rust playground.
Unfortunately, this doesn’t compile:
error[E0210]: type parameter `Lhs` must be used as the type parameter for some local type (e.g., `MyStruct<Lhs>`)
--> src/main.rs:18:6
|
18 | impl<Lhs: AsInt, Rhs: AsInt> Mul<Rhs> for Lhs {
| ^^^ type parameter `Lhs` must be used as the type parameter for some local type
|
= note: implementing a foreign trait is only possible if at least one of the types for which it is implemented is local
= note: only traits defined in the current crate can be implemented for a type parameter
For more information about this error, try `rustc --explain E0210`.
error: could not compile `playground` due to previous error
The problem is that while all the types and the traits in this implementation are indeed local, the rust compiler doesn’t check if we are in violation of the orphan rule by checking which types implement the (local) trait we are implementing (another) trait against – it just checks to see if the implementing type itself is local. You can actually use a generic parameter implementing any (foreign or local) trait in your impl
, but you can’t implement against that generic type directly – you can only forward it as a generic parameter to a local type.
Sidebar: Quare rust’s orphan rule and its limitations?
The most succinct PLT answer to this is that it’s because “local types” is a closed set (a new local type can never be added without changing your code and its API) while “types implementing local trait” is (or could be) an open set: a downstream user of your crate/library may implement your trait on their type at a later date, and suddenly we could have a conflict. You might be tempted to think “that’s on them,” and I wouldn’t blame you (and might even agree) but the problems don’t stop there – we absolutely need the ability to implement both local and foreign traits, but a crate or library upstream of yours (or even the standard library itself) might implement the same foreign trait against types implementing another foreign trait, and then the conflict would be your problem, in your code.
Of course the rust compiler could be smarter about this and allow a combination of only certain permutations of impl/for local/foreign types/traits to get around these restrictions (e.g. allow implementing anything for a local type, implementing only sealed traits inaccessible to downstream users for foreign types, etc) and while there are open issues and rfcs for some of these, the road to hell is paved with good intentions and there are a thousand pitfalls.1 Long story short, the situation is what it is (for now) for $reasons and until that changes, these restrictions on commutative operations for generic types aren’t going anywhere.
Back to the issue at hand
You can kind of work around this by abusing Deref
with an output that’s some intermediate type exposing a reference to an i32
(because we actually have an underlying i32
in this case, and are not just calculating one out of the blue each time), but deref coercion will only get you so far.
For the particular case of Size
, we just need to implement commutative multiplication of Size * number
and number * Size
so it turns out we can actually side-step this entire debate by manually writing out a million or so different impl
s, one for each primitive numeric type (macros help here!). Then multiply those by four, because you need to write a separate impl
for each of Foo * Bar
, Foo * &Bar
, &Foo * Bar
and &Foo * &Bar
. Lots of code, but conceptually simple.
Except it turns out not to be so simple after all. Here’s an example that demonstrates commutative multiplication of a type (but not a reference to a type) with an i32
value:
use core::ops::Mul;
#[derive(Debug, Copy, Clone)]
struct Foo(i32);
impl Mul<Foo> for i32 {
type Output = Foo;
fn mul(self, other: Foo) -> Self::Output {
Foo(self * other.0)
}
}
impl Mul<i32> for Foo {
type Output = Foo;
fn mul(self, other: i32) -> Self::Output {
Foo(self.0 * other)
}
}
fn main() {
println!("{:?}", Foo(7) * 6);
println!("{:?}", 6 * Foo(7));
}
You can try this online in the rust playground.
It works great. This time we are returning a strongly-typed Foo
rather than an i32
scalar value, commutative multiplication works fine, the code compiles, and prints the expected output.
We originally wanted to make this generic over all primitive numeric types, so that if the user has a num: u8
or a float: f64
lying around, they can just perform the multiplication automatically without getting a type mismatch error like you would with the above if you tried to multiply by some already-typed value that rust can’t coerce/infer to be an i32
(which our impl is specifically for):
fn main() {
println!("{:?}", Foo(7) * 6_u8);
println!("{:?}", 6_f32 * Foo(7));
}
You can try this online in the rust playground.
Which gives the following (expected) type errors:
error[E0308]: mismatched types
--> src/main.rs:23:31
|
23 | println!("{:?}", Foo(7) * 6_u8);
| ^^^^ expected `i32`, found `u8`
|
help: change the type of the numeric literal from `u8` to `i32`
|
23 | println!("{:?}", Foo(7) * 6_i32);
| ~~~
error[E0277]: cannot multiply `f32` by `Foo`
--> src/main.rs:24:28
|
24 | println!("{:?}", 6_f32 * Foo(7));
| ^ no implementation for `f32 * Foo`
|
= help: the trait `Mul<Foo>` is not implemented for `f32`
Some errors have detailed explanations: E0277, E0308.
For more information about an error, try `rustc --explain E0277`.
We said we can’t use generics to implement this support, but we can add a second pair of impl Mul
for u8
to get this to work, right?
use core::ops::Mul;
#[derive(Debug, Copy, Clone)]
struct Foo(i32);
impl Mul<Foo> for i32 {
type Output = Foo;
fn mul(self, other: Foo) -> Self::Output {
Foo(self * other.0)
}
}
impl Mul<i32> for Foo {
type Output = Foo;
fn mul(self, other: i32) -> Self::Output {
Foo(self.0 * other)
}
}
impl Mul<Foo> for u8 {
type Output = Foo;
fn mul(self, other: Foo) -> Self::Output {
Foo(self as i32 * other.0)
}
}
impl Mul<u8> for Foo {
type Output = Foo;
fn mul(self, other: u8) -> Self::Output {
Foo(self.0 * other as i32)
}
}
fn main() {
println!("{:?}", Foo(7) * 6_u8);
println!("{:?}", 6_i32 * Foo(7));
}
You can try this online in the rust playground.
Indeed, this adds support for multiplying a Foo
by a u8
or the other way around, just as we wanted. We also still have support for multiplying Foo
by i32
(and vice-versa) as well. Great! This is what we wanted, right? Ergonomics +100 achievement unlocked!
Unfortunately, no. While we did add support for multiplying by u8
or i32
typed values, we broke something probably much more important: the ability to multiply by an untyped (or at least, not explicitly typed) literal:
use core::ops::Mul;
#[derive(Debug)]
struct Foo(i32);
impl Mul<Foo> for i32 {
type Output = Foo;
fn mul(self, other: Foo) -> Self::Output {
Foo(self * other.0)
}
}
impl Mul<Foo> for u8 {
type Output = Foo;
fn mul(self, other: Foo) -> Self::Output {
Foo(self as i32 * other.0)
}
}
fn main() {
let prod = 7 * Foo(6);
assert_eq!(prod.0, 42);
}
You can try this online in the rust playground.
This breaks in a rather weird way: you’d expect that if there’s any confusion about what a type is, it’s about whether 7
is an i32
or u8
here. Indeed, that’s what’s happening internally, but that’s not what the error surfaced by the rust compiler says:
error[E0282]: type annotations needed
--> src/main.rs:39:9
|
39 | let prod = 7 * Foo(6);
| ^^^^^
|
= note: type must be known at this point
help: consider giving `prod` an explicit type
|
39 | let prod: _ = 7 * Foo(6);
| +++
Weird. We know (or at least, can reasonably surmise) the problem is with the ambiguity in the literal 7
and whether the compiler should invoke the Mul<Foo> for i32
impl or the Mul<Foo> for u8
impl, but the compiler says the problem is actually with the missing return type for the entire operation (which is always Foo
because that’s the Mul::Output
we have specified for both)! In fact, an older version of the compiler produces a better message:
<snip>
error[E0283]: type annotations needed
--> src/main.rs:39:19
|
39 | let prod1 = 7 * Foo(6);
| ^ cannot infer type for type `{integer}`
|
note: multiple `impl`s satisfying `{integer}: Mul<Foo>` found
--> src/main.rs:22:1
|
22 | impl Mul<Foo> for i32 {
| ^^^^^^^^^^^^^^^^^^^^^
...
30 | impl Mul<Foo> for u8 {
| ^^^^^^^^^^^^^^^^^^^^
However, let’s just take the latest rustc
at its word and add the missing Foo
type to the let prod = ...
expression:
fn main() {
let prod: Foo = 7 * Foo(6);
assert_eq!(prod.0, 42);
}
And everything magically works! But we didn’t actually solve the problem we were dealing with, we just worked around the resulting compiler error – something each of our users would have to do any time they relied on type inference to multiply a scalar number by a typed Foo
(interesting tidbit: this doesn’t happen the other way around, when multiplying a Foo
by a scalar value – I’m not sure why, but I’ve opened bugs for these issues: [1], [2]).
To recap:
- Rust’s orphan rule prevents us from implementing commutative addition/multiplication for types implementing a trait, which isn’t a complete blocker if you’re the only one that’s ever going to be implementing them because you can use macros or good, old copy-and-paste to work around that limitation and implement the operation manually. Half the operations can be generic over RHS (because
impl<Rhs: ...> Mul<Rhs> for SpecificType
is perfectly legal) but the other half need to be manually spelled out. If you’re doing just two or three types, it’s fairly manageable but since you need4 * M * N
impls in total (accounting for the ref/non-ref permutations), it can quickly spiral into insanity. - A (temporary?) bug? quirk? limitation? in the rust compiler stops us from manually implementing commutative operations with the various numeric literals, because even though rust has a (silent) integer inference preference for
i32
and a default floating point type off64
, the presence of multiple impls breaks type inference in interesting ways.
Just published a long article on a weakness in the #rust type system when it comes to commutative operations and how to preserve backwards compatibility even when making breaking changes.https://t.co/Wodf9rfEF2
— Mahmoud Al-Qudsi (@mqudsi) June 22, 2022
A new (and a newer) size
crate
This brings us at long last to today’s announcement regarding a new size
crate. Faced with the issues above while attempting to implement commutative mathematical operations, size
now features support for the following, implemented via a combination of macros/copy-and-paste and generic impls where possible:
- Strongly-typed addition and subtraction of
Size
values, givingSize
results . This was implemented directly, as there’s only one type involved, with copy-and-pasted impls for the ref/non-ref cases. - Multiplication and division of an LHS
Size
by an RHS integer or floating value, yielding aSize
instance; implemented via generics asimpl<T: ..> Mul<T> for Size
is perfectly accepted, then copy-and-pasted as needed to handle ref/non-ref permutations. - Multiplication of an LHS integer or floating point value by an RHS
Size
value, yielding aSize
result.This could only be implemented for one integer type (i64
) and floating type (f64
) to prevent the bizarre breakage when an untyped integer/float value is used (with only one possible{float}
type and one possible{integer}
type, rustc will try to coerce to the matching type of the two automatically). This had to be implemented manually (via macros) as rust’s orphan rule got in the way. - Division of an LHS integer by a
Size
value is not implemented, since it makes no sense (what does42 / 16 KiB
yield?).
That was pretty much it in terms of the features I’d wanted to implement from a few years back before I was stymied by the rust restrictions/limitations we’ve discussed. But the additions and improvements to size
didn’t stop there:
- As a result of implementing
core::ops::Subtraction
, it became necessary to add support for the concept of negative file sizes (something which can only exist in the abstract and wasn’t previously supported). This necessitated a change in the “output type” used by the library, and now the core primitive type returned/expected by the library (generic overloads excepted) isi64
rather thanu64
. - The goal of this crate has changed from “merely” providing formatting for file sizes to encapsulating all operations on sizes in general by providing a strongly-typed size that can expose just the right number of features and functionality while restricting the user from doing things that don’t make sense (such as dividing a scalar integer by a file size, as mentioned above). To that end, it is now possible to directly compare
Size
types for equality or order (viaPartialEq
andPartialOrd
impls). - With its newfound ability to do more than just format file sizes for human readable output, it’s possible to imagine using
size
in completely different contexts. To that end, thesize
crate may now be compiled as ano_std
library2 which lets you use the basicSize
features such as initializing aSize
from different units, comparingSize
instances, etc but disables features that aren’t meant to be used in embedded or otherno_std
contexts. - The
size
crate no longer has any dependencies. It previously featured only a single dependency onnum_crates
(plus its transitive dependency onautocfg
) for abstracting over the different primitive numeric types, but the latest releases now use a sealed local trait and some macros to accomplish the same but without any foreign dependencies. Compilation time has been significantly improved as a result.
The changes above formed the bulk of the size
0.2.0 release. But just as I was about to sit down and write up this article, it struck me that the Size
api was not very rusty. The crate (and its basic API) was originally written in 2018 and envisioned somewhat differently from how it turned out. The original idea was to take advantage of rust’s game-changing tagged enums to provide, in addition to pretty printing of file sizes, an interface for converting directly between sizes of different units (almost like a units(1)
but in rust).
Rust’s enums seemed perfect for the job, so size
0.1.x and 0.2.x shipped with an API that exposed an enum composed of a strongly-typed unit name and a generic, numeric size, e.g. Size::Bytes(T)
, Size::Gigabytes(T)
, Size::Kibibytes(T)
, etc. But in practice, people aren’t reaching out for the size
crate to convert between well-defined base-2/base-10 size units, they’re using it to create strongly-typed Size
objects to represent an underlying file size and format it for display. Users requested the ability to perform math/logic on Size
types, but users didn’t care for requesting an equivalent Size
but with a base unit of gigabytes.
The majority of the code I found on GitHub and in other nooks and crannies of the internet ended up looking like this:
let size = Size::Bytes(some_value);
...
println!("File size: {}", size);
Which, while being perfectly valid rust code and actually conforming to the rust formatting rules and regulations, just didn’t feel rusty and didn’t match the approach that other crates have pretty much clustered around. You don’t get the feeling that Size<T>::Bytes()
is an enum so much as it appears to be an unfortunately mis-capitalized method exposed by the Size
type. What’s more, the interface was extremely generic heavy, but the generics were only skin-deep because all operations coming out of a Size
stripped the original T
and returned values of the intermediate (u64
/i64
) numeric type instead. While the Size
variants were storing the user-picked T
, it wasn’t actually used anywhere except as an input to internal calculations completely masked to the end user, it wasn’t intuitive that operations like Size<u8> + Size<f64>
were even possible let alone yielded a Size<i64>
(regardless of the initial types), and the internal type conversions and changes to precision (one way or the other) were not intuitively exposed.
Enter size
0.3.x with a new and rustier (if not improved) API that should feel more natural to rustaceans around the globe. Size
has been changed from an enum
to a struct
and now exposes functions to model the behavior previously exposed by the old variants. The biggest API change is that the Size
type itself is no longer generic and things like Size::Kilobytes(10)
are now expressed as Size::from_kilobytes(10)
(or, optionally, Size::from_kb(10)
instead). It should be more immediately intuited that a numeric conversion is (or at least may be) taking place given the “from” in the function name and the fact that you are not directly instantiating an instance of Size
containing a particular numeric type T
that is somehow never afterwards seen.
One other minor change that may be of interest to other crate developers: there are certain spellings or phrasings specific to each community, and it helps the ecosystem considerably for crate authors to make a conscious effort to adhere to them where possible. For example, while size
0.2.x spelled “lower case” as two words, the rust standard library has it as a single word “lowercase” and so enum members like Style::FullLowerCase
have been renamed to Style::FullLowercase
to match.
Preserving backwards compatibility in rust
It would seem like a massively breaking change to switch the core Size
type from an enum
to a struct
, let alone to rename virtually all the interface members in such a manner. But if you take the unix “source-compatible” approach rather than focusing on strict ABI compatibility, things are actually not that – if you’re willing to break some conventions and bend the rust compiler to your will with liberal usage of #[allow(...)]
in carefully chosen places.
After the new Size
interface was in place, a second impl Size { ... }
was added, – this time prefixed with #[doc(hidden)]
to keep it out of the documentation – that contained a number of “fakes,” in this case, const functions masquerading as enum variants. Here’s an excerpt of what that looks like:
#[doc(hidden)]
impl Size {
#![allow(non_snake_case)]
#[inline]
#[deprecated(since = "0.3", note = "Use Size::from_bytes() instead")]
/// Express a size in bytes.
pub const fn Bytes(t: T) -> Self { Self::from_bytes(t) }
#[inline]
#[deprecated(since = "0.3", note = "Use Size::from_kibibytes() instead")]
/// Express a size in kibibytes. Actual size is 2^10 \* the value.
pub const fn Kibibytes(t: T) -> Self { Self::from_kibibytes(t) }
// ...
}
You may not be able to achieve full compatibility with the old API and there are a lot of cases where this won’t cut it, but fortunately for us, they’re not how most users would approach things. For example, someone creating a Size
via Size<u64>::Bytes(num.into())
would find that their code no longer compiles, as Size
itself is not generic (rather, it’s the function/mock variant Size::Bytes<T>
that is generic over T
). But luckily for us, that’s not how most people would write that code and the “natural” way of expressing it (Size::Bytes(num as u64)
) continues to compile, happily oblivious to the fact that we’re actually calling a function called Bytes()
rather than constructing an enum variant Size<T>::Bytes
.
For the renamed “plain” enums, a similar approach was used to make it seem like FullLowerCase
was still a valid member of the Style
enum (used to specify how the unit name is formatted when the size is pretty-printed):
enum Style { .... }
impl Style {
#[doc(hidden)]
#[allow(non_upper_case_globals)]
#[deprecated(since = "0.3", note = "Use Style::FullLowercase instead")]
/// A backwards-compatible alias for [`Style::FullLowercase`]
pub const FullLowerCase: Style = Style::FullLowercase;
}
In this particular case, it would have been possible to keep the old FullLowerCase
enum member around and simply hide it from the docs, since Style
remained an enum
. But that would mean updating all our match sites to handle both the old and the new name, incurring both a maintenance and a (negligible) runtime cost to keeping the backwards-compatible name around. With this approach, and especially with all the old names kept in a separate impl Style
block that only contained shims for the deprecated API, there is almost no cost to keeping the code compatible for a few versions or however long we choose to support the legacy API.
Again, this isn’t a magic fix that keeps everything working, but it does handle pretty much all the cases our users were actually using (in this case, calling a function and specifying a Style::Foo
variant as a parameter). I highly recommend using GitHub’s (or any other service’s) code search feature to look at how people are using your API before introducing breaking changes or remodeling an API; it really helps to understand how your users approach your crate, which may be quite different than how you originally intended for it to be used.
Using size
or contributing
The latest release of the size
crate is available on crates.io, and the documentation has been completely overhauled as part of the new 0.2.x and 0.3.x releases. The source code is available on GitHub and is released under the MIT license.
I’ve actually released size
0.4 shortly after publishing this article, mainly to future-proof the API against breaking changes in the future (by breaking it in the here-and-now instead 🤦♀️). Unfortunately, I wasn’t able to use any of the methods outlined above to preserve backwards compatibility, and I humbly apologize to everyone affected by this breakage!
You can use size
in your rust code today by simply adding a reference to size
in your Cargo.toml
and placing use size::Size
at the top of your rust code:
use size::Size;
use std::fs::File;
fn main() {
let metadata = File::open("foo.bin").metadata().unwrap();
let file_size = Size::from_bytes(metadata.len());
println!("{}", file_size); // prints "13.37 MiB"
}
Sign up and follow for more!
If you found this article interesting, please follow me on twitter and sign up for my rust mailing list to get notifications on new rust articles and make sure you never miss out. You won’t get any other emails, I pinky swear!
Just published a long article on a weakness in the #rust type system when it comes to commutative operations and how to preserve backwards compatibility even when making breaking changes.https://t.co/Wodf9rfEF2
— Mahmoud Al-Qudsi (@mqudsi) June 22, 2022
For example, a foreign trait is implemented for impls of a local trait, but one of your types implements both the local trait and some other upstream trait and a later upstream release implements the same foreign trait for all impls of the other foreign trait, and suddenly your type has multiple impls for the same trait. ↩
Just compile with default features disabled. ↩
Is there an RSS feed for this blog?
@koutheir yes, it should be auto-discoverable. https://neosmart.net/blog/feed/