{"id":4958,"date":"2022-10-03T15:11:34","date_gmt":"2022-10-03T20:11:34","guid":{"rendered":"https:\/\/neosmart.net\/blog\/?p=4958"},"modified":"2022-10-08T19:59:55","modified_gmt":"2022-10-09T00:59:55","slug":"implementing-truly-safe-semaphores-in-rust","status":"publish","type":"post","link":"https:\/\/neosmart.net\/blog\/implementing-truly-safe-semaphores-in-rust\/","title":{"rendered":"Implementing truly safe semaphores in rust"},"content":{"rendered":"<p><em class=\"onload\">Discuss this article <a href=\"https:\/\/old.reddit.com\/r\/rust\/comments\/xvm9ul\/implementing_truly_safe_semaphores_in_rust_and\/\" rel=\"follow\">on r\/rust<\/a> or <a href=\"https:\/\/news.ycombinator.com\/item?id=33084442\" rel=\"follow\">on Hacker News<\/a>.<\/em><\/p>\n<p>Low-level or systems programming languages generally strive to provide libraries and interfaces that enable developers, boost productivity, enhance safety, provide resistance to misuse, and more &#8212; all while trying to reduce the runtime cost of such initiatives. Strong type systems turn runtime safety\/sanity checks into compile-time errors, optimizing compilers try to reduce an enforced sequence of api calls into a single instruction, and library developers think up of clever hacks to even completely erase any trace of an abstraction from the resulting binaries. And as anyone that&#8217;s familiar with them can tell you, the rust programming language and its developers\/community have truly embraced this ethos of zero-cost abstractions, perhaps more so than any others.<\/p>\n<p>I&#8217;m not going to go into detail about what the rust language and standard library do to enable zero-cost abstractions or spend a lot of time going over some of the many examples of zero-cost interfaces available to rust programmers, though I&#8217;ll just quickly mention a few of my favorites: iterators and <a href=\"https:\/\/doc.rust-lang.org\/nightly\/std\/iter\/trait.Iterator.html\" rel=\"follow\">all the methods the <code>Iterator<\/code> trait exposes<\/a> have to be at the top of every list given the amount of black magic voodoo the compiler has to do to turn these into their loop-based equivalents, zero-sized types make developing embedded firmware in rust a dream and it&#8217;s really crazy to see how <a href=\"https:\/\/docs.rs\/embedded-hal\/latest\/embedded_hal\/\" rel=\"nofollow\">all the various peripheral abstractions<\/a> can be completely erased giving you small firmware blobs despite all the safety abstractions, and no list is complete the newest member of the team, <code>async<\/code>\/<code>await<\/code> and how rust manages to turn an entire web server api into a single state machine and event loop. (And to think this can be used even on embedded without a relatively heavy async framework like tokio and with even zero allocations to boot!)<\/p>\n<p><!--more--><\/p>\n<p>But the tricky thing with abstractions is that the <em>relative<\/em> price you pay scales rather unfairly with the size of the interface you are abstracting over. While a byte here and a byte there may mean nothing when we&#8217;re talking framework-scale interfaces, when you are modeling smaller and finer-grained abstractions, every byte and every instruction begin to count.<\/p>\n<p>A couple of weeks ago, we released an update to <a href=\"https:\/\/github.com\/neosmart\/rsevents\" rel=\"nofollow\"><code>rsevents<\/code><\/a>, our crate that contains a rusty cross-platform equivalent to WIN32 events for signaling between threads and writing your own synchronization primitives, and <a href=\"https:\/\/github.com\/neosmart\/rsevents-extra\" rel=\"nofollow\"><code>rsevents-extra<\/code><\/a> a companion crate that provides a few handy synchronization types built on top of the manual- and auto-reset events from the <code>rsevents<\/code> crate. Aside from the usual awesome helpings of performance improvements, ergonomics enhancements, and more, <strong>this latest version of <code>rsevents-extra<\/code> includes <a href=\"https:\/\/docs.rs\/rsevents-extra\/latest\/rsevents_extra\/struct.Semaphore.html\" rel=\"nofollow\">a <code>Semaphore<\/code> synchronization primitive<\/a><\/strong> &#8211; something that the rust standard library surprisingly lacks&#8230; but not without good reason.<\/p>\n<h2>What makes a semaphore a semaphore?<\/h2>\n<p>Semaphores are well-documented and fairly well-understood underpinnings of any concurrency library or framework and essential Computer Science knowledge. So why doesn&#8217;t the rust standard library have a semaphore type?<\/p>\n<p>Unlike the synchronization types that the rust standard library currently provides (such as <code><a href=\"https:\/\/doc.rust-lang.org\/nightly\/std\/sync\/struct.Mutex.html\">Mutex&lt;T&gt;<\/a><\/code> and <a href=\"https:\/\/doc.rust-lang.org\/nightly\/std\/sync\/struct.RwLock.html\" rel=\"follow\"><code>RwLock&lt;T&gt;<\/code><\/a>, a semaphore is somewhat harder to model as it doesn&#8217;t so much restrict concurrent access to a single object or variable so much as it does limit concurrency within a <em>region<\/em> of code.<\/p>\n<p>Of course it can be argued that in traditional programming a semaphore is just a more general case of a mutex, and just like mutexes traditionally protected a region of code from concurrent access<sup id=\"rf1-4958\"><a href=\"#fn1-4958\" title=\"In fact, in some languages\/apis a &ldquo;critical section&rdquo; is another name for a mutex.\" rel=\"footnote\">1<\/a><\/sup> but were converted into synchronization primitives <em>owning<\/em> the data they protect and marshalling access to it, there&#8217;s no reason a rust semaphore couldn&#8217;t do the same. But therein lies the problem: a mutex and a read-write lock can both be understood in terms of <em>readers<\/em> and\u00a0<em>writers<\/em>,<sup id=\"rf2-4958\"><a href=\"#fn2-4958\" title=\"For a mutex, they are one and the same as it mandates exclusive access to a region.\" rel=\"footnote\">2<\/a><\/sup> a semaphore makes no such guarantees. And rust is quite fundamentally built on the concept of read ^ write: it\u00a0<em>needs<\/em> to know if a thread\/scope is reading or writing from a variable or memory location in order to uphold its most basic memory safety guarantee: there can either be multiple &#8220;live&#8221; read-only references to an object or a single write-enabled (<code>&amp;mut<\/code>) reference to the same &#8212; <strong>but a semaphore doesn&#8217;t make that distinction<\/strong>!<\/p>\n<p>While a strictly binary semaphore (max concurrency == 1) can guarantee that there will never be multiple writers accessing a memory region, there&#8217;s not much theoretical benefit to such a binary semaphore over a mutex &#8211; in fact, they&#8217;re interchangeable. What makes a semaphore truly special is that it can be created (or even dynamically modified) with a concurrency limit <em>n<\/em> and then uphold its core precondition, guaranteeing that at any given time there will never be more than <em>n<\/em> threads\/stacks<sup id=\"rf3-4958\"><a href=\"#fn3-4958\" title=\"Semaphores are generally non-reentrant so recursively obtaining a semaphore will count against its limit!\" rel=\"footnote\">3<\/a><\/sup> accessing a semaphore-protected region at any given time.<\/p>\n<p>The problem is that with <em>n<\/em> &gt; 1, there&#8217;s no concept of a &#8220;privileged&#8221; owning thread and all threads that have &#8220;obtained&#8221; the semaphore do so equally. Therefore, a <em>rust<\/em> semaphore can only ever provide read-only (<code>&amp;T<\/code>) access to an underlying resource, limiting the usefulness of such a semaphore almost to the point of having no utility. As such, the only safe &#8220;owning&#8221; semaphore with read-write access that can exist in the rust world would be \u00a0<code>Semaphore&lt;()&gt;<\/code>,<sup id=\"rf4-4958\"><a href=\"#fn4-4958\" title=\"The empty parentheses\/tuple () is rust-speak for void, for those of you not familiar with the language.\" rel=\"footnote\">4<\/a><\/sup> or one that actually owns <em>no\u00a0<\/em>data and can only be used for its side effect of limiting concurrency while the semaphore is &#8220;owned,&#8221; so to speak.<sup id=\"rf5-4958\"><a href=\"#fn5-4958\" title=\"In rust, there&rsquo;s actually (out of necessity) an entire class of&nbsp; types or values that can be changed even through read-only references, a concept known as interior mutability and exposed via types like AtomicXXX and the various std::cell Cell types &mdash; but those are generally to be used on a fine-grained level and in general you wouldn&rsquo;t be using them to make entire objects writeable via read-only references.\" rel=\"footnote\">5<\/a><\/sup> (Actual mutation of accessed resources within the concurrency-limited region, if needed, would continue to be marshalled via <code>Mutex&lt;T&gt;<\/code> or <code>RwLock&lt;T&gt;<\/code> on a fine-grained level.)<\/p>\n<p>Ok, so this explains why the rust standard library doesn&#8217;t contain a <code>Semaphore&lt;T&gt;<\/code> type to mirror <code>Mutex&lt;T&gt;<\/code> and its friends, but then what&#8217;s so hard about shipping a non-owning <code>std::sync::Semaphore<\/code> instead?<\/p>\n<h2>Designing a safe <code>Semaphore<\/code> for rust<\/h2>\n<p>To answer this, we need to look at what a semaphore API generally looks like in other languages. While the names and calling semantics differ, a semaphore is generally found as a type that provides the following, starting with the most fundamental to\u00a0<em>de facto<\/em> properties:<\/p>\n<ul>\n<li>It is a type that can be used to limit concurrency to a resource or region of code, up to a dev-defined limit <em>n<\/em>.<\/li>\n<li>It is a type that has a concept of &#8220;currently available concurrency,&#8221; which represents and tracks the remaining number of threads\/stacks that can &#8220;obtain&#8221; the semaphore, thereby reducing its available concurrency and generally giving the calling thread access to the concurrency-limited region,<\/li>\n<li>A semaphore can be created\/declared with an &#8220;initially available concurrency&#8221; and a &#8220;maximum possible concurrency,&#8221; which may differ (indeed, &#8220;initially available concurrency&#8221; is often zero),<\/li>\n<li>Semaphores don&#8217;t generally have a concept of ownership, meaning a thread (or <em>any<\/em> thread) can increment (up to the pre-defined limit) or decrement (down to zero) the available concurrency for a semaphore without having &#8220;obtained&#8221; or &#8220;created&#8221; it. (This is necessary otherwise it&#8217;d be impossible to initialize a semaphore with a lower initial concurrency limit than its maximum, because no thread could then increase it.)<\/li>\n<\/ul>\n<p>It&#8217;s the last of these points that makes a semaphore so tricky to model in any language that prides itself on safety. While a semaphore that acts strictly as a variable-occupancy mutex (i.e. initial concurrency equals the max concurrency and each time it is obtained, it must be subsequently released by the same thread that obtained it), that&#8217;s not generally a requirement that semaphores impose, and such a requirement would considerably limit the utility that a semaphore could offer.<\/p>\n<p>Let&#8217;s look at some ways we might design such a semaphore in rust, some of which we actually tried while prototyping <code>rsevents_extra::Semaphore<\/code>.<\/p>\n<p>Before anything else, let&#8217;s get the hard part out of the way by introducing you to <a href=\"https:\/\/docs.rs\/rsevents\/latest\/rsevents\/struct.AutoResetEvent.html\" rel=\"nofollow\"><code>rsevents::AutoResetEvent<\/code><\/a>, a one-byte<sup id=\"rf6-4958\"><a href=\"#fn6-4958\" title=\"Actually, the AutoResetEvent implementation only takes all of two bits, but let&rsquo;s just call it a byte to make everything nice and easy.\" rel=\"footnote\">6<\/a><\/sup> synchronization primitive that takes care of putting threads to sleep when the event isn&#8217;t signalled\/available and allowing one-and-only-one waiting thread to either consume the event (if it&#8217;s not already asleep) or to wake up (if it&#8217;s asleep waiting for the event) when the event is signalled (after which the event is atomically reset to a &#8220;not signalled&#8221; state). It doesn&#8217;t even have any spurious waits, making it really nice and easy to work with in a safe fashion. All of our <code>Semaphore<\/code> implementations will use this auto-reset event to take care of the synchronization and we&#8217;ll omit the details of when and where to call <a href=\"https:\/\/docs.rs\/rsevents\/latest\/rsevents\/struct.AutoResetEvent.html#method.set\" rel=\"nofollow\"><code>AutoResetEvent::set()<\/code><\/a> and <a href=\"https:\/\/docs.rs\/rsevents\/latest\/rsevents\/struct.AutoResetEvent.html#method.reset\" rel=\"nofollow\"><code>AutoResetEvent::reset()<\/code><\/a> for now.<\/p>\n<p>So here&#8217;s what our initial semaphore skeleton looks like. We know we need an internal <code>count<\/code> of some integral type to keep track of the current concurrency (since we already established that it&#8217;s going to be variable and not just zero or one), and we know that at minimum a semaphore&#8217;s interface needs to provide a wait to &#8220;obtain&#8221; the semaphore (decrementing the available concurrency for future callers) and a way to &#8220;release&#8221; the semaphore (at least to be used by a thread that has already obtained a &#8220;concurrency token&#8221; to re-increment the count after it is done and wants to give up its access to the concurrency-restricted region for some other caller to take).<\/p>\n<pre><code class=\"language-rust\">struct Semaphore {\r\n    event: AutoResetEvent,\r\n    count: AtomicU32,\r\n    \/\/ TODO: Fill in the rest\r\n}\r\n\r\nimpl Semaphore {\r\n    \/\/\/ Create a new `Semaphore`\r\n    pub fn new(\/* TODO *\/) -&gt; Semaphore { \/* TODO *\/ }\r\n\r\n    \/\/\/ Obtain the `Semaphore`, gaining access to the protected concurrency\r\n    \/\/\/ region and reducing the semaphore's internal count. If the \r\n    \/\/\/ `Semaphore` is not currently available, blocks sleeping until it\r\n    \/\/\/ becomes available and is successfully obtained.\r\n    pub fn wait(&amp;self) -&gt; ??? { ... }\r\n\r\n    \/\/\/ Increment the semaphore's internal count, increasing the available\r\n    \/\/\/ concurrency limit.\r\n    pub fn release(&amp;self, ...) { ... }\r\n}\r\n<\/code><\/pre>\n<p>Our goal now is to fill in the blanks, attempting to make a semaphore that&#8217;s maximally useful and as safe as possible, while limiting its size and runtime checks (the costs of said safety).<\/p>\n<h3>A constant maximum concurrency?<\/h3>\n<p>We&#8217;ve already established that a semaphore needs to offer a tunable &#8220;maximum concurrency&#8221; parameter that decides the maximum number of threads that can have access to the concurrency-limited region at any given time. This number is typically supplied at the time the semaphore is instantiated, and it&#8217;s quite normal for it to be fixed thereafter: while the\u00a0<em>current<\/em> available concurrency may be artificially constrained beyond the number of threads that have actually borrowed\/obtained the semaphore, it&#8217;s OK for the absolute maximum to be unchangeable after a semaphore is created.<\/p>\n<p>We really have only two choices here: we either add a <code>max_count<\/code> integral struct member to our <code>Semaphore<\/code> or we take advantage of <a href=\"https:\/\/practice.rs\/generics-traits\/const-generics.html\" rel=\"follow\">rust&#8217;s const generics<\/a> (another brilliant zero-cost abstraction!) to eliminate this field from our struct altogether&#8230; but at a considerable cost.<\/p>\n<p>Let&#8217;s consider some constraints that might determine the maximum concurrency a semaphore can have:<\/p>\n<ul>\n<li>We&#8217;ve experimentally determined that our backup application works best with up to eight simultaneous threads in the file read part of the pipeline and up to two simultaneous threads in the file write part of the pipeline, to balance throughput against maxing out the disk&#8217;s available IOPS. Here we can use two separate semaphores, each with a different hard-coded maximum concurrency.<\/li>\n<li>Almost all modern web browsers limit the maximum number of live TCP connections per network address to a hard-coded limit, typically 16. Internally, the browser has a semaphore for each domain name with an active TCP connection (perhaps in a <code>HashMap&lt;DomainName, Semaphore&gt;<\/code>), and blocks before opening a new connection if the maximum concurrency is exceeded.<\/li>\n<\/ul>\n<p>In these cases, we could get away with changing our <code>Semaphore<\/code> declaration to <code>Semaphore&lt;const MAX: usize&gt;<\/code> and using const generics when initializing a semaphore to specify the hard-coded maximum concurrency limit, e.g. <code>let sem = Semaphore::&lt;16&gt;::new(...)<\/code>. But const generics doesn&#8217;t just lock us into a hard-coded value specified at the time of object initialization, it locks us into a hard-coded value specified\u00a0<em>at the time of writing the source code for the object&#8217;s initialization.<\/em> That means we\u00a0<em>can&#8217;t<\/em> use a const generic parameter in lieu of a <code>max_concurrency<\/code> field in cases like the following, which we unfortunately can&#8217;t just ignore:<\/p>\n<ul>\n<li>We want to limit the number of threads accessing a code section or running some operation to the number (or a ratio of the number) of CPU cores the code is running on (not <em>compiled<\/em> on).<\/li>\n<li>We want to let the user select the max concurrency at runtime, perhaps by parsing a <code>-j THREADS<\/code> argument or letting the user choose from a drop-down menu or numselect widget in a GUI application.<\/li>\n<li>Going back to our backup application example, we want to use different read\/write concurrency limits in the case of an SSD vs in the case of an old-school HDD.<\/li>\n<\/ul>\n<p>This means that we&#8217;re going to have to approximately double the size of our <code>Semaphore<\/code> object, mirroring whatever type we&#8217;re using for <code>Semaphore::count<\/code> to add a <code>Semaphore::max_concurrency<\/code> member (they have to be the same size because <code>count<\/code> can vary from zero to <code>max_concurrency<\/code>), giving us the following, fairly complete, struct declaration.<\/p>\n<h3>A functionally complete semaphore<\/h3>\n<p>As we&#8217;ve determined, we unfortunately can&#8217;t use rust&#8217;s const generics to roughly halve the space a <code>Semaphore<\/code> object takes in memory, giving us the following declaration:<\/p>\n<pre><code class=\"language-rust\">struct Semaphore {\r\n    event: AutoResetEvent,\r\n    count: AtomicU32,\r\n    max_count: AtomicU32,\r\n}\r\n<\/code><\/pre>\n<p>While we were forced to add a <code>max_count<\/code> field to store the maximum allowed concurrency for the semaphore, this wasn&#8217;t at all a violation of the &#8220;zero-cost abstraction&#8221; principle: if we want to allow the user to specify something at runtime and match against it later, this is the cost we invariably pay (whether in rust or in assembly).<\/p>\n<p>As bare as our <code>Semaphore<\/code> structure is, this is actually enough to implement a completely functional and &#8211; for the most part &#8211; safe semaphore. Thanks to how <code>AutoResetEvent<\/code> is implemented internally and its own safety guarantees, we can get away with just some extremely careful use of atomics, without a lock or mutex of any sort:<\/p>\n<pre><code class=\"language-rust\">impl Semaphore {\r\n    \/\/\/ Create a new `Semaphore`\r\n    pub fn new(initial_count: u32, max_count: u32) -&gt; Semaphore {\r\n        Semaphore { \r\n            event: AutoResetEvent::new(EventState::Unset),\r\n            count: initial_count,\r\n            max_count,\r\n        }\r\n    }\r\n\r\n    \/\/\/ Obtain the `Semaphore`, gaining access to the protected concurrency\r\n    \/\/\/ region and reducing the semaphore's internal count. If the \r\n    \/\/\/ `Semaphore` is not currently available, blocks sleeping until it\r\n    \/\/\/ becomes available and is successfully obtained.\r\n    pub fn wait(&amp;self) {\r\n        let mut count = self.count.load(Ordering::Relaxed);\r\n\r\n        loop {\r\n            count = if self.count == 0 {\r\n                \/\/ Semaphore isn't available, sleep until it is.\r\n                self.event.wait();\r\n                \/\/ Refresh the count after waking up\r\n                self.count.load(Ordering::Relaxed)\r\n            } else {\r\n                \/\/ We can't just fetch_sub(1) because it might underflow in a race\r\n                match self.count.compare_exchange(count, count - 1, \r\n                    Ordering::Acquire, Ordering::Relaxed)\r\n                {\r\n                    Ok(_) =&gt; {\r\n                        \/\/ We officially obtained the semaphore.\r\n                        \/\/ If the (now perhaps stale) new `count` value is non-zero, \r\n                        \/\/ it's our job to wake someone up (or let the next caller in).\r\n                        if count - 1 &gt; 0 {\r\n                            self.event.set();\r\n                        }\r\n                        break;\r\n                    },\r\n                    Err(count) =&gt; count, \/\/ Refresh the cached value and try again\r\n                }\r\n            }\r\n        }\r\n\r\n        \/\/ This must hold true at all times!\r\n        assert!(count &lt;= self.max_count); \r\n        return ();\r\n    }\r\n\r\n    \/\/\/ Increment the semaphore's internal count, increasing the available\r\n    \/\/\/ concurrency limit. \r\n    pub fn release(&amp;self) { \r\n        \/\/ Try to increment the current count, but don't exceed max_count\r\n        let old_count = self.count.fetch_add(1, Ordering::Release);\r\n        if old_count + 1 &gt; self.max_count {\r\n            <span style=\"color: #ff0000;\"><strong>panic!(\"Attempt to release past max concurrency!\");<\/strong><\/span>\r\n        }\r\n        \/\/ Check if we need to wake a sleeping waiter\r\n        if old_count == 0 {\r\n            self.event.set();\r\n        }\r\n    }\r\n}\r\n<\/code><\/pre>\n<p>If you&#8217;re familiar with atomic <abbr title=\"compare-and-swap\">CAS<\/abbr>, the annotated source code above should be fairly self-explanatory. In case you&#8217;re not, briefly, <code>AtomicXXX::compare_and_exchange()<\/code> is how most lock-free data structures work: you first read the old value (via <code>self.count.load()<\/code>) and decide based on it what the new value should be, then use <code>compare_and_exchange()<\/code> to change <code>$old<\/code> to <code>$new<\/code>,\u00a0<em>if and only if<\/em> the value is still <code>$old<\/code> and hasn&#8217;t been changed by another thread in the meantime (if it <em>has<\/em> changed, we re-read to get the new value and try all over again until we succeed).<\/p>\n<p>With the <code>Semaphore<\/code> skeleton filled out, we now have a functionally complete semaphore with features comparable to those available in other languages. At the moment, <code>Semaphore::wait()<\/code> returns nothing and any thread that&#8217;s obtained the semaphore must ensure that <code>Semaphore::release()<\/code> is called before it returns, but that&#8217;s just an ergonomic issue that we can easily work around by returning a concurrency token that calls <code>Semaphore::release()<\/code> when it&#8217;s dropped instead.<sup id=\"rf7-4958\"><a href=\"#fn7-4958\" title=\"We would still have to keep Semaphore::release() around and make sure it can be publicly called so that a semaphore initialized with { count: m, max_count: n, ... } with&nbsp;m &ge; 0 and&nbsp;n &gt; m can be used.\" rel=\"footnote\">7<\/a><\/sup><\/p>\n<p>But can we make it safer?<\/p>\n<h3>The problem with <code>Semaphore::release()<\/code><\/h3>\n<p>For the remainder of this article, we&#8217;ll be focusing on the ugly part of a semaphore, the part that makes writing a truly safe semaphore so challenging: <code>Semaphore::release()<\/code>.<\/p>\n<p>In rust, the standard library concurrency primitives all return &#8220;scope guards&#8221; that automatically release (or poison) their associated synchronization object at the end of the scope or in case of panic. This actually isn&#8217;t just a question of ergonomics (as we mentioned before), it&#8217;s a core part of their safety. On Windows, you can create a mutex\/critical section with <code>InitializeCriticalSection()<\/code> and obtain it with <code>EnterCriticalSection()<\/code> but any thread can call <code>LeaveCriticalSection()<\/code>, <a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/api\/synchapi\/nf-synchapi-leavecriticalsection#:~:text=If%20a%20thread%20calls%20LeaveCriticalSection%20when%20it%20does%20not%20have%20ownership%20of%20the%20specified%20critical%20section%20object%2C%20an%20error%20occurs%20that%20may%20cause%20another%20thread%20using%20EnterCriticalSection%20to%20wait%20indefinitely.\" rel=\"follow\">even if it invokes undefined behavior that may cause a deadlock<\/a>. On Linux and its pals, <code>pthread_mutex_init()<\/code> can be used to create a mutex and <code>pthread_mutex_lock()<\/code> can be used to enter the mutex. While <code>pthread_mutex_unlock()<\/code> <a href=\"https:\/\/linux.die.net\/man\/3\/pthread_mutex_unlock#:~:text=The%20pthread_mutex_unlock()%20function%20may,not%20own%20the%20mutex.\" rel=\"follow\">is documented as <em>may<\/em> return an <code>EPERM<\/code> error<\/a> if the current thread doesn&#8217;t own the mutex, the notes clarify that the current owning thread id isn&#8217;t stored and deadlocks are allowed in order to avoid overhead &#8211; meaning its implementation-defined whether or not <code>pthread_mutex_unlock()<\/code> actually protects against unlocking from a different thread. In practice, it doesn&#8217;t.<sup id=\"rf8-4958\"><a href=\"#fn8-4958\" title=\"You can see a sample application testing for pthread_mutex_unlock() safety here and try it out for yourself online here.\" rel=\"footnote\">8<\/a><\/sup><\/p>\n<p>Rust, as a partially object-oriented language, sidesteps all this via our good friend <abbr title=\"Resource Acquisition Is Initialization\">RAII<\/abbr> by simply making the equivalent of <code>Mutex&lt;T&gt;::unlock(&amp;self)<\/code> unavailable except via the scope guard (<a href=\"https:\/\/doc.rust-lang.org\/nightly\/std\/sync\/struct.MutexGuard.html\" rel=\"follow\"><code>MutexGuard<\/code><\/a>) returned by <code>Mutex&lt;T&gt;::lock(&amp;self)<\/code> (and the same for <code>RwLock&lt;T&gt;<\/code>). The type system prevents you from calling the equivalent of <code>pthread_mutex_unlock()<\/code> unless you already own a <code>MutexGuard<\/code>, which can only be acquired as a result of a successful call to <code>Mutex::lock()<\/code> &#8211; without needing any runtime code to check whether or not the calling thread is the owning thread, because the type system provides that safety guarantee at zero cost.<\/p>\n<p>Unfortunately, as I mentioned earlier in passing, even if we did make our <code>Semaphore::wait()<\/code> function return some sort of concurrency token\/scope guard, it would at best by an\u00a0<em>ergonomic<\/em> improvement to make sure one doesn&#8217;t forget to call <code>Semaphore::release()<\/code> before exiting the scope, but it wouldn&#8217;t allow us to eliminate a publicly callable <code>Semaphore::release()<\/code> function that any thread could call at any time, at least not without making it impossible to create a semaphore with an initial count that doesn&#8217;t equal the maximum count, and not without making the &#8220;current maximum concurrency&#8221; limit non-adjustable at runtime.<\/p>\n<h3>Do we even need <code>Semaphore::release()<\/code> anyway?<\/h3>\n<p>At this point, you might be tempted to ask if we really truly absolutely need these options and questioning what purpose they serve &#8211; which is a good and very valid question, given the cost we have to pay to support it and especially because in all the cases we mentioned above (those with a fixed <code>max_count<\/code> and those without), you&#8217;d always create a semaphore with <code>initial_count == max_count<\/code>. Here are some hopefully convincing reasons:<\/p>\n<ul>\n<li>Semaphores aren&#8217;t just used to limit concurrency up to a maximum pre-defined limit, they&#8217;re also used to temporarily artificially constrain available concurrency to a value in the range of <code>[0, max_count]<\/code> &#8211; in fact, this is where you might find their greatest utility.<\/li>\n<li>CS principle: Preventing a &#8220;<a href=\"https:\/\/en.wikipedia.org\/wiki\/Thundering_herd_problem\" rel=\"follow\">thundering herd<\/a>&#8221; when a number of threads are blocked waiting on a semaphore to allow them access to a resource or code region. You can have up to\u00a0<em>n<\/em> threads performing some task at the same time, but you don&#8217;t want them to all start synchronously because it&#8217;ll cause unwanted contention on some shared state, e.g.\n<ul>\n<li>Perhaps an atomic that needs to be incremented, and you don&#8217;t want to waste cycles attempting to satisfy <code>compare_exchange()<\/code> calls under contention or you otherwise have a traditional fine-grained lock or even lock-free data structure where uncontended access is ~free but contention wastes CPU cycles busy-waiting in a spinlock or even puts threads to sleep;<\/li>\n<li>Starting operations at ~the same time can run into real-world physical limitations, e.g. it takes <em>n<\/em> IO threads to achieve maximum saturation of available <em>bulk<\/em> bandwidth but the initial seeks can&#8217;t satisfy as many IOPS without unduly increasing latency and decreasing response times.<\/li>\n<li>The semaphore controls the number of threads downloading in parallel but you don&#8217;t want to DOS the DNS or HTTP servers by flooding them with simultaneous requests, though you do ultimately want\u00a0<em>n<\/em> threads handling the response, downloading the files, etc. in parallel.<\/li>\n<\/ul>\n<\/li>\n<li>CS principle: Making available concurrency accrue over time, up to a maximum saturation limit (represented by <code>max_count<\/code>), for example:\n<ul>\n<li>A family of AWS EC2 virtual machine instances have what is referred to as &#8220;<a href=\"https:\/\/docs.aws.amazon.com\/AWSEC2\/latest\/UserGuide\/burstable-performance-instances.html\" rel=\"follow\">burstable performance<\/a>&#8221; where for every\u00a0<em>m<\/em> minutes of low CPU usage, you get\u00a0<em>t<\/em> time slices of &#8220;more than what you paid for&#8221; instantaneous CPU performance, up to\u00a0<em>n<\/em> maximum time slices accrued. As a simplified example, you &#8220;rent&#8221; a virtual machine with a lower nominal CPU speed of 2.0 GHz and for every 5 minutes with a p95 per-5-second-time-slice CPU utilization below 50%, you get a &#8220;free&#8221; 5-second-time-slice of 3.0 GHz compute. If you launch a burstable instance and immediately try running Prime95, you&#8217;ll be capped to 2.0 GHz, but if you are running nginx and serving web requests under relatively no load, then after 10 minutes you&#8217;ll have accrued 10 seconds of &#8220;free&#8221; 3.0 GHz performance. When you suddenly get a burst of traffic because a link to your site was just shared on an iMessage or WhatsApp group and 200 phones are hitting your server to generate a web preview, you can &#8220;cash in&#8221; those accrued 3.0 GHz time slices to satisfy those requests quickly, after which you&#8217;re constrained to the baseline 2.0 GHz performance until the requests settle down and you begin to accrue 3.0 GHz time slices once again.<\/li>\n<li>While you can implement a rate limiter by giving each connected user\/IP address a maximum number of requests they can make per 5 minutes and resetting that limit every 5 minutes, that still means your server can be DDOS&#8217;d by n clients saturating their limit of 100-requests-per-5-minutes in the first few seconds after establishing a connection. You can instead give each client a semaphore<sup id=\"rf9-4958\"><a href=\"#fn9-4958\" title=\"After all, if we use an AtomicU8 to represent the max\/initial count, they can be as small as three bytes each!\" rel=\"footnote\">9<\/a><\/sup> with an initial available count of, say, 2 and then every three seconds increment the available concurrency by 3; meaning clients would have to be connected for a full 5 minutes before they can hit you with 100 requests in one go (making it more expensive to mount an attack and giving your firewall&#8217;s heuristics a chance to disconnect them).<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>Hopefully the above reasons demonstrate the utility in being able to create a <code>Semaphore<\/code> with an initial count lower than the maximum count, and therefore, the need to have a free-standing <code>Semaphore::release()<\/code> function that, at the very least, the creator of the semaphore can call even without having previously made a corresponding call to <code>Semaphore::wait()<\/code>.<\/p>\n<h3>Can we make <code>Semaphore::release()<\/code> safer?<\/h3>\n<p>There&#8217;s an immediately obvious way to make <code>Semaphore::release()<\/code> safer to call, and that&#8217;s to replace it with a <code>Semaphore::try_release()<\/code> method that first checks to ensure that the semaphore&#8217;s core integrity predicate (<code>self.count &lt;= self.max_count<\/code>) is upheld instead of blindly incrementing <code>self.count<\/code> and then panicking if it then exceeds <code>self.max_count<\/code>, returning <code>true<\/code> or <code>false<\/code> instead to denote whether the operation completed successfully or not.<\/p>\n<p>It&#8217;s actually not that hard of a change to make, provided you&#8217;re familiar with <a href=\"https:\/\/en.wikipedia.org\/wiki\/Compare-and-swap\" rel=\"follow\">compare-and-exchange atomics<\/a> (which we used above in safely implementing <code>Semaphore::wait()<\/code> to prevent two threads from racing to increment <code>self.count<\/code> after checking that it&#8217;s less than <code>self.max_count<\/code>), which we&#8217;ll use to write our <code>try_release()<\/code> method now:<\/p>\n<pre><code class=\"language-rust\">impl Semaphore {\r\n    \/\/\/ Attempts to increment the `Semaphore`'s internal count,\r\n    \/\/\/ returning `false` if it would exceed the maximum allowed.\r\n    pub fn try_release(&amp;self) -&gt; bool {\r\n        let mut prev_count = self.count.load(Ordering::Relaxed);\r\n        loop {\r\n            if prev_count == self.max_count {\r\n                \/\/ Incrementing self.count would violate our core precondition\r\n                return false;\r\n            }\r\n\r\n            match self.count.compare_exchange_weak(\r\n                prev_count, \/\/ only exchange `count` if it's still `prev_count`\r\n                prev_count + 1, \/\/ what to exchange `count` with\r\n                Ordering::Release, Ordering::Relaxed)\r\n            {\r\n                \/\/ The CAS succeeded\r\n                Ok(_) =&gt; {\r\n                    if prev_count == 0 {\r\n                        \/\/ Wake a waiting thread\r\n                        self.event.set();\r\n                    }\r\n                    return true;\r\n                }\r\n                \/\/ If it failed, refresh `prev_count` and retry, failing if \r\n                \/\/ another thread has caused `prev_count` to equal `max_count`.\r\n                Err(new_count) =&gt; prev_count = new_count,\r\n            }\r\n        }\r\n    }\r\n}\r\n<\/code><\/pre>\n<p>This <em>is<\/em> much better. We only increment <code>count<\/code> if we can guarantee that it won&#8217;t cause it to exceed <code>max_count<\/code>, and relay our results to the caller. We can now guarantee that call to <code>Semaphore::try_release()<\/code> that wasn&#8217;t paired with a previous <code>Semaphore::wait()<\/code> won&#8217;t inadvertently violate the absolute limit on the allowed concurrency.<\/p>\n<p>But can you spot the problem? Look at the code sample carefully, and consider it not in the context of a single call to <code>Semaphore::try_release()<\/code> but as a whole.<\/p>\n<p>If you think you&#8217;ve figured it out or you just want to skip to the answers, read on to find out where the problem still lies. (Yes, this is just a filler paragraph to avoid your eyes immediately seeing the answer while you think this over. Hopefully you haven&#8217;t already scrolled down too much so that the answer is already in view. Sorry, I can&#8217;t really do better than this, my typesetting system doesn&#8217;t let me insert empty vertical whitespace very elegantly.)<\/p>\n<p>The issue is that we&#8217;ve prevented <em>this<\/em> call to <code>Semaphore::try_release()<\/code> from incrementing <code>count<\/code> past <code>max_count<\/code>, but we might already have extant threads chugging away in parallel that have already obtained the semaphore, oblivious to the fact that <code>count<\/code> has changed from under them. The problem is easy to spot if we keep a copy of our old <code>Semaphore::release()<\/code> around and mix-and-match between it and <code>try_release()<\/code>, with calls we\u00a0<em>expect<\/em> to always succeed using the former and calls that may overflow the current count using the latter:<\/p>\n<pre><code class=\"language-rust\">fn main() {\r\n    \/\/ Create a semaphore with a count of 1 and max count of 2\r\n    let semaphore = Semaphore::new(1, 2); \r\n\r\n    \/\/ Create a number of threads that will use the semaphore \r\n    \/\/ to make sure concurrency limits are observed.\r\n    \r\n    for n in 0..NUM_CPUS {\r\n        std::thread::spawn(|| {\r\n            \/\/ Make sure to obtain a concurrency token before beginning work\r\n            semaphore.wait();\r\n\r\n            \/\/ TODO: Replace `sleep()` with actual work!\r\n            std::thread::sleep(Duration::from_secs(5));\r\n\r\n            \/\/ Release the concurrency token when we're done so another \r\n            \/\/ thread may enter this code block and do its thing.\r\n            semaphore.release();\r\n        });\r\n    }\r\n\r\n    while !work_finished {\r\n        \/\/ &lt;Read user input from keyboard here&gt;\r\n        \r\n        \/\/ In response to a user command, increase concurrency \r\n        if !semaphore.try_release() {\r\n            eprintln!(\"Cannot raise limit, max already reached!\");\r\n        }\r\n    }\r\n}\r\n<\/code><\/pre>\n<p>We&#8217;re using our semaphore to limit the number of simultaneous worker threads, originally operating at 50% of our maximum concurrency limit (<code>count == 1<\/code>). To start, one of <em>NUM_CPUS<\/em><sup id=\"rf10-4958\"><a href=\"#fn10-4958\" title=\"For the sake of this example, let&rsquo;s assume NUM_CPUS is a sufficiently large number like 4 or 8, so that enough worker threads will try to enter the semaphore-protected region.\" rel=\"footnote\">10<\/a><\/sup> worker threads obtains the semaphore (<code>count == 0<\/code>) to access the concurrency limited region and the rest are locked out.<\/p>\n<p>In the main thread&#8217;s event loop, we wait for work to finish or a user command to be entered at the keyboard (omitted from the example).<sup id=\"rf11-4958\"><a href=\"#fn11-4958\" title=\"rsevent-extra&lsquo;s CountdownEvent might just be the perfect tool for this job, btw!\" rel=\"footnote\">11<\/a><\/sup> The user keys in an input that&#8217;s interpreted as &#8220;try to speed things up by increasing the concurrency limit,&#8221; and in response the main thread calls <code>Semaphore::try_release()<\/code> and succeeds in incrementing its internal <code>self.count<\/code> (now <code>count == 1<\/code> again), thereby allowing a second thread to enter the semaphore-protected region (giving <code>count == 0<\/code>) and contribute to progress.<\/p>\n<p>But what happens now that two threads are already in the semaphore-protected scope but the user triggers a second call to <code>Semaphore::try_release()<\/code>? As far as the semaphore knows, <code>count<\/code> (equal to 0) is less than <code>max_count<\/code>, and can be safely incremented, yielding <code>count == 1<\/code> and thereby unblocking yet another worker thread sleeping in <code>semaphore::wait()<\/code> (reducing <code>count<\/code> to zero).<\/p>\n<p>But what does that mean? Even though our core precondition\u00a0<em>hasn&#8217;t<\/em> been violated\u00a0 in this instant (<code>Semaphore::count &lt;= Semaphore::max_count<\/code>), we now have\u00a0<em>three<\/em> worker threads in the concurrency-limited region, exceeding the hard-coded <code>max_count<\/code> provided during the semaphore initialization! In our example above where each worker thread, having first obtained a concurrency token via <code>Semaphore::wait()<\/code>, assumes that it can call the infallible <code>Semaphore::release()<\/code> method safely when it&#8217;s done, but when the last worker thread finishes its work, it&#8217;ll end up incrementing <code>count<\/code> past <code>max_count<\/code> and panicking in the process.<\/p>\n<p>Of course, the panic isn&#8217;t the problem &#8211; it&#8217;s just reporting the violation when it sees it. We could replace all <code>Semaphore::release()<\/code> calls with <code>Semaphore::try_release()<\/code> and we&#8217;d still have a problem: there are more worker threads in the semaphore-protected region than allowed, and one of the &#8220;shouldn&#8217;t fail&#8221; calls to <code>Semaphore::try_release()<\/code> will eventually return <code>false<\/code>, whether that triggers a panic or not.<\/p>\n<p>The crux of the matter is that we borrow from <code>Semaphore::count<\/code> but don&#8217;t have a way to track the total &#8220;live&#8221; concurrency count available (the remaining concurrency tokens in <code>self.count<\/code> plus any temporarily-unavailable-but-soon-to-be-returned concurrency tokens borrowed by the various worker threads). And, importantly, <strong>we don&#8217;t need this for any core semaphore functionality but rather only to protect against a user\/dev calling <code>Semaphore::release()<\/code> more times than they&#8217;re allowed to<\/strong>. In a perfect world, it would be the developer&#8217;s responsibility to make sure that there are never more outstanding concurrency tokens than allowed, and we could omit these safety checks altogether. She could statically ensure it&#8217;s the case by only ever pairing together <code>wait()<\/code> and <code>release()<\/code> calls (perhaps after issuing a fixed, verifiable number of <code>release()<\/code> calls upon init), or by tracking\u00a0<em>when and where needed<\/em> the number of free-standing calls to <code>release()<\/code> in an external variable to make sure the limit is never surpassed.<\/p>\n<h3>At last, a truly safe semaphore?<\/h3>\n<p>While we could offload the cost of verifying that there will never be more than <code>max_count<\/code> calls to <code>Semaphore::release()<\/code> to the developer, we&#8217;re rust library authors and we hold ourselves to a higher standard. We can most certainly do just that, but we would have to mark <code>Semaphore::release()<\/code> as an <code>unsafe<\/code> function to warn users that there are preconditions to calling this and dangers to be had if calling it willy-nilly without taking them into account. But what if we want a synchronization type that&#8217;s easier to use and worthy of including in the standard library, being both safe and easy-to-use? What then?<\/p>\n<p>There are actually several approaches we can take to solving this gnarly problem. The easiest and safest solution would be to simply change our <code>Semaphore<\/code> definition, adding another field of the same integral count type, perhaps called <code>total_count<\/code> or <code>live_count<\/code> that would essentially track <code>initial_count<\/code> plus the number of successful free-standing calls to <code>Semaphore::release()<\/code>,\u00a0<em>but not decrement it when a call to <code>Semaphore::wait()<\/code> is made<\/em>.<sup id=\"rf12-4958\"><a href=\"#fn12-4958\" title=\"Another way would be to pack count and max_count into a single double-width atomic (assuming such an atomic of this size exists) and to decrement&nbsp;both count and max_count when a call to Semaphore::wait() is made. This way, any calls to Semaphore::release() would compare the potential increase in count against a similarly decremented max_count and can catch any violations of our core precept. The issues described in the remainder of this article persist regardless of which method was chosen.\" rel=\"footnote\">12<\/a><\/sup> This way each time <code>try_release()<\/code> is called, we can check if <code>total_count<\/code> (and not <code>count<\/code>) would exceed <code>max_count<\/code> &#8212; and continue to use just <code>self.count<\/code> for the core semaphore functionality in <code>Semaphore::wait()<\/code>:<\/p>\n<pre><code class=\"language-rust\">struct Semaphore {\r\n\tevent: AutoResetEvent,\r\n\tcount: AtomicU32,\r\n\ttotal_count: AtomicU32,\r\n    max_count: AtomicU32,\r\n} \r\n\r\nimpl Semaphore {\r\n    \/\/\/ Attempts to increment the `Semaphore`'s internal count,\r\n    \/\/\/ returning `false` if it would exceed the maximum allowed.\r\n    pub fn try_release(&amp;self) -&gt; bool {\r\n        \/\/ First try incrementing `total_count`:\r\n        let mut prev_count = self.total_count.load(Ordering::Relaxed);\r\n        loop {\r\n            if prev_count == self.max_count {\r\n                \/\/ Incrementing self.total_count would violate our core precondition\r\n                return false;\r\n            }\r\n\r\n            match self.total_count.compare_exchange_weak(\r\n                prev_count, \/\/ only exchange `total_count` if it's still `prev_count`\r\n                prev_count + 1, \/\/ what to exchange `total_count` with\r\n                Ordering::Relaxed, Ordering::Relaxed)\r\n            {\r\n                \/\/ If the CAS succeeded, continue to the next phase\r\n                Ok(_) =&gt; break,\r\n                \/\/ If it failed, refresh `prev_count` and retry, failing if \r\n                \/\/ another thread has caused `prev_count` to equal `max_count`.\r\n                Err(new_count) =&gt; prev_count = new_count,\r\n            }\r\n        }\r\n\r\n        \/\/ Now increment the actual available count:\r\n        let prev_count = self.count.fetch_add(1, Ordering::Release);\r\n        if prev_count == 0 {\r\n            \/\/ Wake up a sleeping waiter, if any\r\n            self.event.set();\r\n        }\r\n\r\n        return true;\r\n    }\r\n}\r\n<\/code><\/pre>\n<h3>But now, something else breaks!<\/h3>\n<p>Let&#8217;s go back to our last example with many worker threads attempting to access a concurrency-restricted region and a main event loop that reads user commands that can affect the available concurrency limit. To make this more relatable, let&#8217;s use a real-world example: we&#8217;re developing a command-line BitTorrent client and we want the user to control the number of simultaneous uploads or downloads, up to some limit (that might very well be <code>u32::MAX<\/code>). In the last example we were focused on user commands that <em>increased<\/em> the available concurrency limit, in our real-world example perhaps reflecting a user trying to speed up ongoing downloads or seeds by allowing more concurrent files or connections.<\/p>\n<p>But this isn&#8217;t a one-way street! Just as a user may wish to unthrottle BitTorrent downloads, they might very well wish to throttle them to make sure they&#8217;re just seeding in the background without saturating their Top Tier Cable Provider&#8217;s pathetic upload speed and killing their internet connection in the process. How do we do <em>that<\/em> safely?<\/p>\n<p>One way would be to introduce a method to directly decrement our semaphore&#8217;s <code>self.total_count<\/code> and <code>self.count<\/code> fields (down to zero), but what do we do if <code>total_count<\/code> was non-zero but <code>count<\/code> was zero (i.e. all available concurrency was currently in use)? Apart from the fact that we are using unsigned atomic integers to store the counts, we could decrement <code>count<\/code> (but not <code>total_count<\/code>) past zero, for example to <code>-1<\/code>, and let the &#8220;live&#8221; concurrency eventually settle at the now-reduced <code>total_count<\/code> after a sufficient number of threads finish their work and leave the concurrency-limited region.<\/p>\n<p>But we don&#8217;t actually need to do any of that: by its nature a semaphore already provides a way to artificially reduce the available concurrency by means of a free-standing call to <code>Semaphore::wait()<\/code>, i.e. calling <code>wait()<\/code> without calling <code>release()<\/code> afterwards. It ensures that the <code>count<\/code> isn&#8217;t reduced until it&#8217;s non-zero and that <code>count<\/code> never exceeds <code>max_count<\/code> or <code>total_count<\/code> at any time, not even temporarily.<\/p>\n<p>Unfortunately, herein lies a subtle problem. With our revised semaphore implementation, we increase both <code>count<\/code> and <code>total_count<\/code> when the free-standing <code>release()<\/code> is called and assume that each call to <code>Semaphore::wait()<\/code> will have a matching call to some <code>Semaphore::special_release()<\/code> that increases <code>count<\/code> without touching <code>total_count<\/code>. This way, <code>total_count<\/code> tracks the &#8220;total available&#8221; concurrency, assuming that it&#8217;s equal to &#8220;remaining concurrency limit&#8221; plus &#8220;outstanding calls to <code>wait()<\/code> that haven&#8217;t yet called <code>xxx_release()<\/code>.<\/p>\n<p>While free-standing calls to <code>Semaphore::release()<\/code> were our problem before, here we&#8217;ve shifted that to an issue with free-standing calls to <code>Semaphore::wait()<\/code> &#8211; an admittedly less hairy of a situation but, as we have seen, still not one that we can afford to ignore!<\/p>\n<p>More importantly, even if we weren&#8217;t using free-standing calls to <code>Semaphore::wait()<\/code> to artificially reduce the available concurrency, we actually <em>can&#8217;t<\/em> guarantee that <code>release()<\/code> is always called after <code>wait()<\/code>: it&#8217;s a form of the halting problem, and <em>even if<\/em> we ignore panics and have <code>wait()<\/code> return a scope guard that automatically calls <code>release()<\/code> when it&#8217;s dropped, it&#8217;s <em>still<\/em> completely safe for a user to call <code>std::mem::forget(scope_guard)<\/code> thereby preventing <code>release()<\/code> from being called!<sup id=\"rf13-4958\"><a href=\"#fn13-4958\" title=\"In rust parlance, memory leaks do not fall under the safety guarantees of the language and it&rsquo;s perfectly &ldquo;safe&rdquo; if not exactly cromulent to write code that doesn&rsquo;t drop() RAII resources.\" rel=\"footnote\">13<\/a><\/sup><\/p>\n<p>Fundamentally, we can&#8217;t really solve this problem. We either err on the side of potentially allowing too many free-standing calls to <code>release()<\/code> to be made, with safety checks delaying the overflow of <code>max_count<\/code> until the last borrowed concurrency token is returned after a call to <code>wait()<\/code>; or we err on the side of prudence and incorrectly prevent a free-standing call to <code>release()<\/code> from going through because we don&#8217;t know (and\u00a0<em>can&#8217;t<\/em> know) that a thread which previously call <code>wait()<\/code> and took one of our precious concurrency tokens has decided it&#8217;s not going to ever return it.<\/p>\n<p>But don&#8217;t despair! Do you remember the old schoolyard riddle? You&#8217;re silently passed a pen and notebook, on which you see the following:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-4972 colorbox-4958\" src=\"https:\/\/neosmart.net\/blog\/wp-content\/uploads\/2022\/09\/Make-this-line-shorter.png\" alt=\"\" width=\"525\" height=\"509\" srcset=\"https:\/\/neosmart.net\/blog\/wp-content\/uploads\/2022\/09\/Make-this-line-shorter.png 525w, https:\/\/neosmart.net\/blog\/wp-content\/uploads\/2022\/09\/Make-this-line-shorter-309x300.png 309w\" sizes=\"auto, (max-width: 525px) 100vw, 525px\" \/><\/p>\n<p>Like with the riddle,<sup id=\"rf14-4958\"><a href=\"#fn14-4958\" title=\"Give up? Just draw another, longer line next to it!\" rel=\"footnote\">14<\/a><\/sup> we can implement scope guards to make it more likely that every call to <code>wait()<\/code> is matched by call to <code>release()<\/code> but we can&#8217;t actually stop the user from calling <code>std::mem::forget(sem.wait())<\/code> &#8211; and we don&#8217;t have to. Without trying to think of ways to cause a compiler error when a scope guard is leaked and not dropped, we can still make it, if not hard then at least <em>harder<\/em> for the user to leak a scope guard and throw our internal count off. How? Not by hiding the ability to forget a scope guard but by highlighting it front and center!<\/p>\n<p>Let&#8217;s fast forward to our semaphore from above, but modified to return a scope guard instead to encourage returning concurrency tokens back to the semaphore when a thread has finished with a concurrency-limited operation:<\/p>\n<pre><code class=\"language-rust\">\/\/\/ This concurrency token is returned from a call to `Semaphore::wait()`.\r\n\/\/\/ It's automatically returned to the semaphore upon drop, incrementing\r\n\/\/\/ the semaphore's internal available concurrency counter once more.\r\nstruct ConcurrencyToken&lt;'a&gt; {\r\n    sem: &amp;'a Semaphore\r\n}\r\n\r\nimpl Semaphore {\r\n    pub fn wait&lt;'a&gt;(&amp;'a self) -&gt; ConcurrencyToken&lt;'a&gt; {\r\n        include!(\"earlier definition of Semaphore::wait() here\");\r\n\r\n        \/\/ Now instead of returning () we return a ConcurrencyToken\r\n\r\n        return ConcurrencyToken {\r\n            sem: self,\r\n        }\r\n    }\r\n\r\n    \/\/\/ Directly increments the internal concurrency count without touching \r\n    \/\/\/ `total_count` and without checking if it would exceed `max_count`.\r\n    unsafe fn release_internal() {\r\n        let prev_count = self.count.fetch_add(1, Ordering::Release);\r\n\r\n        \/\/ We only need to wake a sleeping waiter if the previous count\r\n        \/\/ was zero. In all other cases, no one will be asleep.\r\n        if prev_count == 0 {\r\n            self.event.set();\r\n        }\r\n    }\r\n}\r\n\r\nimpl Drop for ConcurrencyToken&lt;'_&gt; {\r\n    fn drop (&amp;mut self) {\r\n        unsafe { self.sem.release_internal(); }\r\n    }\r\n}\r\n<\/code><\/pre>\n<p>This was our initial attempt at &#8220;strongly encouraging&#8221; calls to <code>Semaphore::wait()<\/code> to always be paired with calls to <code>Semaphore::internal_release()<\/code> (called by the <code>ConcurrencyToken<\/code> on drop), which decrements <code>count<\/code> without touching <code>total_count<\/code> so our logic in <code>Semaphore::try_release()<\/code> can continue to work.<\/p>\n<p>As we said though, if one were to call <code>std::mem::forget(sem.wait())<\/code> the <code>ConcurrencyToken<\/code> would be forgotten without <code>internal_release()<\/code> ever being called, and the count we track in <code>total_count<\/code> would be off by one, preventing a free-standing call to <code>Semaphore::release()<\/code> that should have been allowed.<\/p>\n<p>So what if we just directly add a new method to our concurrency token? A\u00a0 <code>ConcurrencyToken::forget()<\/code> makes it <em>harder<\/em> to call <code>std::mem::forget()<\/code> on our concurrency token than it is to just call <code>Semaphore::wait().forget()<\/code> directly! (See, I really was going somewhere with that riddle!)<\/p>\n<pre><code class=\"language-rust\">impl ConcurrencyToken&lt;'_&gt; {\r\n    \/\/\/ It is a violation of this crate's contract to call `std::mem::forget()`\r\n    \/\/\/ on the result of `Semaphore::wait()`. To forget a `ConcurrencyToken`, \r\n    \/\/\/ use this method instead.\r\n    pub fn forget(self) {\r\n        \/\/ We're keeping `count` permanently reduced, but we need to decrement\r\n        \/\/ `total_count` to reflect this as well before forgetting ourselves.\r\n        self.sem.total_count.fetch_sub(1, Ordering::Relaxed);\r\n        std::mem::forget(self);\r\n    }\r\n}\r\n<\/code><\/pre>\n<p>And just like that, we now have something we can reasonably call a &#8220;safe&#8221; semaphore, worthy of rust!<\/p>\n<h2>The price we pay for safety<\/h2>\n<p>While I can&#8217;t say with complete confidence that this is the optimal implementation of a safe semaphore (exposing the same functionality), our journey above is still representative of the constant tug-of-war that takes place when trying to build an API as you juggle performance, the desire for zero-cost abstractions, and the imperative of surfacing a (within the boundaries of reasonable use) safe and misuse-resistant interface.<\/p>\n<p>We started with something completely simple: two words for the count and a single byte auto-reset event we could use to impose strict ordering and optimized waiting\/sleeping in cases of contention. Correctness (which, if you squint at it in a particular way, is just another kind of safety) mandated the use of atomics from the very start, preventing us from playing fast and loose with our integer math and imposing heavy penalties when it came to ensuring cache coherence and cross-core integrity. Then, just when we thought we had everything figured out, we needed to completely change our approach and even add a new struct member to boot (raising the size of the <code>Semaphore<\/code> struct by 33-45% depending on the integer width, which sounds really scary until you realize it&#8217;s still just a handful of bytes).<\/p>\n<p>There are of course other possible solutions to the same problem, all of which potentially have their own drawbacks.<sup id=\"rf15-4958\"><a href=\"#fn15-4958\" title=\"I still need to sit down and experiment with packing count and max_count into one atomic double-width word and see how it works to decrement both count and max_count after each call to wait() instead of tracking with an additional total_count field, but even there we&rsquo;d have a price to pay. We can no longer use AtomicXXX::fetch_add() and we&rsquo;d have to use compare_exchange_weak() in a loop, after fetching the initial value, separating it into its component fields, incrementing\/decrementing, then combining the fields into a single word again &ndash; although a quick godbolt attempt shows the compiler actually does a rather nice job.\" rel=\"footnote\">15<\/a><\/sup> And even if there are cost-free solutions here, the general picture isn&#8217;t unique to semaphores or even concurrency primitives in general: it&#8217;s a story that&#8217;s played on repeat and comes up every time an interface needs to be added that has some caveats the caller needs to keep in mind. Writing correct code is hard, writing\u00a0<em>safe<\/em> and correct code is harder. But, in my opinion,\u00a0<em>this<\/em> is what actually makes rust special.<\/p>\n<p>Rust&#8217;s concepts of ownership and exclusive RO\/RW semantics play a huge role in making it such a popular language with low-level developers, but I would argue that it&#8217;s this attention that&#8217;s paid when writing libraries that deal with intricate or abstract concepts that can&#8217;t be reduced to simple <code>&amp;foo<\/code> vs <code>&amp;mut foo<\/code> semantics that make rust truly unique. As an old hat at C and C++ development, I&#8217;ve already worn my &#8220;low-level library developer&#8221; mantle thin, and it&#8217;s absolutely awesome to be able to write an abstraction that other developers can use without having to dive into syscalls and kernel headers as the only source of literature. But with rust, I&#8217;m experiencing a completely different kind of challenge in writing libraries and APIs: here there&#8217;s a bar even higher than just writing clean abstractions, and it&#8217;s being able to write these low-level abstractions in a way that not only can clever developers that have previously dealt with these system internals appreciate and use our libraries to their advantage, but that even others new to the scene can just read your documentation (or maybe not even that) and then let the shape of your API guide them to the rust, figuring out the <em>right<\/em> way of using it.<\/p>\n<p>In any language, a savvy enough developer can usually figure their way around even a completely new library dealing with concepts and mechanisms completely alien to them. But in the process of figuring it out, they&#8217;re bound to make mistakes, bump into leaky abstractions (&#8220;but\u00a0<em>why<\/em> shouldn&#8217;t I call <code>pthread_mutex_unlock<\/code> from another thread if I need to access the mutex and its currently locked? What is it there for, then?&#8221; &#8211; whether they&#8217;re asking it on SO or mulling it over quietly in their head as they figure out the black-box internals by poking and prodding away at the API surface), pull out what&#8217;s left of their hair, and bang their head against the wall some before arriving at a generally correct and workable solution.<\/p>\n<p>But it doesn&#8217;t have to be that way, and the burden is on us as developers of these libraries and crates to give our fellow devs a better option. Runtime errors (like the ones the pthread API doesn&#8217;t even bother returning!) are good and other languages<sup id=\"rf16-4958\"><a href=\"#fn16-4958\" title=\"C# is a great example here, with extensive input and sanity checking for most APIs but almost all of it in the form of runtime exceptions &ndash; despite it being a strongly typed language with powerful and extensible compiler integration.\" rel=\"footnote\">16<\/a><\/sup> have demonstrated how it can be used with non-zero but still fairly minimal overhead, but with the benefits of a strongly typed language, powerful type abstractions, and perhaps a smattering of generics and ZSTs, we can and should do better.<\/p>\n<h2>The truly safe semaphore, for your benefit and review<\/h2>\n<p>The semaphore we iteratively designed in this article is available for you to use, study, or review as part of the 0.2 release of the <a href=\"https:\/\/github.com\/neosmart\/rsevents-extra\" rel=\"nofollow\">rsevents-extra crate<\/a>. <a href=\"https:\/\/docs.rs\/rsevents-extra\/0.2.0\/rsevents_extra\/struct.Semaphore.html#method.try_release\" rel=\"nofollow\">This is the current API exposed by our <code>Semaphore<\/code><\/a> type (<a href=\"https:\/\/github.com\/neosmart\/rsevents-extra\/blob\/master\/src\/semaphore.rs\" rel=\"nofollow\">source code here<\/a>), incorporating some of the ideas discussed above.<sup id=\"rf17-4958\"><a href=\"#fn17-4958\" title=\"It currently still uses atomic unsigned integers in the implementation and so does not implement the wait-free, eventually consistent API to artificially reduce the currently available concurrency without making a wait() call,&nbsp;waiting for it to return, and then forgetting the result. At the time of its initial development, I had started off with signed integers then realized I didn&rsquo;t need them for the core functionality and switched to unsigned atomic values instead. I may need to revisit that decision in another release if it can give us either a wait-free reduce() corollary to release() instead of the Semaphore::wait().forget() or the current modify() method which allows wait-free direct manipulation of the internal count, but only with an &amp;mut Semaphore reference (to guarantee that it isn&rsquo;t in use, eschewing eventual consistency for correctness), but feedback on whether a wait-free reduce()&nbsp;at the cost of eventual consistency is a win or a draw\/loss would be appreciated from anyone nerdy enough to read these footnotes!\" rel=\"footnote\">17<\/a><\/sup><\/p>\n<p>The <code>Semaphore<\/code> type in <code>rsevents-extra<\/code> <a href=\"https:\/\/github.com\/neosmart\/rsevents-extra\/blob\/master\/src\/semaphore.rs\" rel=\"nofollow\">actually includes even more safety<\/a> than we&#8217;ve demonstrated above, but it&#8217;s of the bog-standard variety (checking for integer overflows, making sure we&#8217;re not decrementing past zero, etc) and not something unique to the challenges presented by semaphores in particular. The <a href=\"https:\/\/docs.rs\/rsevents-extra\/latest\/rsevents_extra\/struct.Semaphore.html#example\" rel=\"nofollow\"><code>Semaphore<\/code> docs example<\/a> shows a more fleshed-out version of the &#8220;listen for user events to artificially throttle\/unthrottle the semaphore,&#8221; if you want to check it out.<\/p>\n<p>If you have an itch to try your own hand at writing concurrency primitives, I cannot encourage you enough: it&#8217;s all kinds of challenging and rewarding, and really opens your mind to what goes on behind-the-scenes with synchronization types. The <code>rsevents<\/code> crate was written to make doing just that a breeze, and I recommend at least starting off with either <a href=\"https:\/\/docs.rs\/rsevents\/latest\/rsevents\/struct.ManualResetEvent.html\" rel=\"nofollow\">manual-<\/a> or <a href=\"https:\/\/docs.rs\/rsevents\/latest\/rsevents\/struct.AutoResetEvent.html\" rel=\"nofollow\">auto-reset events<\/a> to take care of the intricacies of sleeping and the correctness of waking one and only one past\/future awaiter at a time. Rust generally uses channels and mutexes to take care of synchronization, but there&#8217;s always a time and place for lower level thread signalling constructs!<\/p>\n<h2>Show some love and be the first to get new rust articles!<\/h2>\n<p>Last but not least: a request for you, dear reader. I put a lot of effort into these rust writeups (and into the open source libraries and crates I author) for nothing but love. I&#8217;ve heard good things about Patreon, and have literally just now <a href=\"https:\/\/www.patreon.com\/mqudsi\" rel=\"follow\">put up a page<\/a> to see if anyone would be interested in sponsoring my open source work. If you can&#8217;t spare some change to support me and my work on Patreon, please consider following me and my work <a href=\"https:\/\/twitter.com\/mqudsi\/\" rel=\"follow\">on twitter<\/a>, and starring <a href=\"https:\/\/github.com\/neosmart\/rsevents\" rel=\"nofollow\">rsevents<\/a> and <a href=\"https:\/\/github.com\/neosmart\/rsevents-extra\" rel=\"nofollow\">rsevents-extra<\/a> on GitHub.<\/p>\n<p><strong>I&#8217;m currently looking for work opportunities as a senior engineer (rust is preferable, but I&#8217;m a polyglot). If you or your team is hiring, let me know!<\/strong><\/p>\n<p>If you liked this writeup, please share it with others on your favorite nerdy social media platform. I also set up a small mailing list that only sends out an email when I post about rust here on this blog, you can sign up below to join (no spam, double opt-in required, one-click unsubscribe, I never share your email, etc. etc.):<\/p>\n<blockquote class=\"twitter-tweet\">\n<p dir=\"ltr\" lang=\"en\">Just posted a ~longer writeup on what it takes to implement a truly safe Semaphore type in <a href=\"https:\/\/twitter.com\/hashtag\/rust?src=hash&amp;ref_src=twsrc%5Etfw\" rel=\"follow\">#rust<\/a>. Feedback welcome. <a href=\"https:\/\/t.co\/QZdeCACpnH\" rel=\"follow\">https:\/\/t.co\/QZdeCACpnH<\/a><\/p>\n<p>\u2014 Mahmoud Al-Qudsi (@mqudsi) <a href=\"https:\/\/twitter.com\/mqudsi\/status\/1577365674460151809?ref_src=twsrc%5Etfw\" rel=\"follow\">October 4, 2022<\/a><\/p><\/blockquote>\n<p><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<div class=\"sendy_widget\" style='margin-bottom: 0.5em;'>\n<p><em>If you would like to receive a notification the next time we release a rust library, publish a crate, or post some rust-related developer articles, you can subscribe below. Note that you'll only get notifications relevant to rust programming and development by NeoSmart Technologies. If you want to receive email updates for all NeoSmart Technologies posts and releases, please sign up in the sidebar to the right instead.<\/em><\/p>\n<iframe tabIndex=-1 onfocus=\"sendy_no_focus\" src=\"https:\/\/neosmart.net\/sendy\/subscription?f=BUopX8f2VyLSOb892VIx6W4IUNylMrro5AN6cExmwnoKFQPz9892VSk4Que8yv892RnQgL&title=Join+the+rust+mailing+list\" style=\"height: 300px; width: 100%;\"><\/iframe>\n<\/div>\n<script type=\"text\/javascript\">function sendy_no_focus(e) { e.preventDefault(); }<\/script>\n<p><strong>Update (10\/5\/2022):\u00a0<\/strong>The examples in this post have been updated to use <code>Ordering::Acquire<\/code> or <code>Ordering::Release<\/code> when reading\/writing <code>self.count<\/code> in the <code>wait()<\/code> and <code>release()<\/code> family of functions to synchronize memory\/cache coherence between threads and ensure correct instruction ordering. In the original version of this article, the examples all used <code>Ordering::Relaxed<\/code> and relied on <code>self.event<\/code> to take care of ordering, but as <code>self.event<\/code> is skipped as an optimization in cases where the semaphore has available concurrency, this was insufficient.<\/p>\n<hr class=\"footnotes\"><ol class=\"footnotes\"><li id=\"fn1-4958\"><p>In fact, in some languages\/apis a &#8220;<a href=\"https:\/\/learn.microsoft.com\/en-us\/windows\/win32\/sync\/critical-section-objects\" rel=\"follow\">critical section<\/a>&#8221; is another name for a mutex.&nbsp;<a href=\"#rf1-4958\" class=\"backlink\" title=\"Jump back to footnote 1 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn2-4958\"><p>For a mutex, they are one and the same as it mandates exclusive access to a region.&nbsp;<a href=\"#rf2-4958\" class=\"backlink\" title=\"Jump back to footnote 2 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn3-4958\"><p>Semaphores are generally non-reentrant so recursively obtaining a semaphore will count against its limit!&nbsp;<a href=\"#rf3-4958\" class=\"backlink\" title=\"Jump back to footnote 3 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn4-4958\"><p>The empty parentheses\/tuple <code>()<\/code> is rust-speak for <code>void<\/code>, for those of you not familiar with the language.&nbsp;<a href=\"#rf4-4958\" class=\"backlink\" title=\"Jump back to footnote 4 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn5-4958\"><p>In rust, there&#8217;s actually (out of necessity) an entire class of\u00a0 types or values that can be changed even through read-only references, a concept known as interior mutability and exposed via types like <code>AtomicXXX<\/code> and the various <code>std::cell Cell<\/code> types &#8212; but those are generally to be used on a fine-grained level and <em>in general<\/em> you wouldn&#8217;t be using them to make entire objects writeable via read-only references.&nbsp;<a href=\"#rf5-4958\" class=\"backlink\" title=\"Jump back to footnote 5 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn6-4958\"><p>Actually, the <code>AutoResetEvent<\/code> implementation only takes <a href=\"https:\/\/github.com\/neosmart\/rsevents\/blob\/master\/src\/lib.rs\" rel=\"nofollow\">all of two bits<\/a>, but let&#8217;s just call it a byte to make everything nice and easy.&nbsp;<a href=\"#rf6-4958\" class=\"backlink\" title=\"Jump back to footnote 6 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn7-4958\"><p>We would still have to keep <code>Semaphore::release()<\/code> around and make sure it can be publicly called so that a semaphore initialized with <code>{ count: m, max_count: n, ... }<\/code> with\u00a0<em>m \u2265 0<\/em> and\u00a0<em>n &gt; m<\/em> can be used.&nbsp;<a href=\"#rf7-4958\" class=\"backlink\" title=\"Jump back to footnote 7 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn8-4958\"><p>You can see a sample application testing for <code>pthread_mutex_unlock()<\/code> safety <a href=\"https:\/\/gist.github.com\/mqudsi\/5aac71d8177866ba2762bda01a3ff21a\" rel=\"follow\">here<\/a> and try it out for yourself online <a href=\"https:\/\/onlinegdb.com\/9hmNhBuEc\" rel=\"follow\">here<\/a>.&nbsp;<a href=\"#rf8-4958\" class=\"backlink\" title=\"Jump back to footnote 8 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn9-4958\"><p>After all, if we use an <code>AtomicU8<\/code> to represent the max\/initial count, they can be as small as three bytes each!&nbsp;<a href=\"#rf9-4958\" class=\"backlink\" title=\"Jump back to footnote 9 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn10-4958\"><p>For the sake of this example, let&#8217;s assume <code>NUM_CPUS<\/code> is a sufficiently large number like 4 or 8, so that enough worker threads will try to enter the semaphore-protected region.&nbsp;<a href=\"#rf10-4958\" class=\"backlink\" title=\"Jump back to footnote 10 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn11-4958\"><p><code>rsevent-extra<\/code>&#8216;s <code><a href=\"https:\/\/docs.rs\/rsevents-extra\/latest\/rsevents_extra\/struct.CountdownEvent.html\">CountdownEvent<\/a><\/code> might just be the perfect tool for this job, btw!&nbsp;<a href=\"#rf11-4958\" class=\"backlink\" title=\"Jump back to footnote 11 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn12-4958\"><p>Another way would be to pack <code>count<\/code> and <code>max_count<\/code> into a single double-width atomic (assuming such an atomic of this size exists) and to decrement\u00a0<em>both<\/em> <code>count<\/code> and <code>max_count<\/code> when a call to <code>Semaphore::wait()<\/code> is made. This way, any calls to <code>Semaphore::release()<\/code> would compare the potential increase in <code>count<\/code> against a similarly decremented <code>max_count<\/code> and can catch any violations of our core precept. The issues described in the remainder of this article persist regardless of which method was chosen.&nbsp;<a href=\"#rf12-4958\" class=\"backlink\" title=\"Jump back to footnote 12 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn13-4958\"><p>In rust parlance, memory leaks do not fall under the safety guarantees of the language and it&#8217;s perfectly &#8220;safe&#8221; if not exactly cromulent to write code that doesn&#8217;t <code>drop()<\/code> RAII resources.&nbsp;<a href=\"#rf13-4958\" class=\"backlink\" title=\"Jump back to footnote 13 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn14-4958\"><p>Give up? Just draw another, longer line next to it!&nbsp;<a href=\"#rf14-4958\" class=\"backlink\" title=\"Jump back to footnote 14 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn15-4958\"><p>I still need to sit down and experiment with packing <code>count<\/code> and <code>max_count<\/code> into one atomic double-width word and see how it works to decrement both <code>count<\/code> and <code>max_count<\/code> after each call to <code>wait()<\/code> instead of tracking with an additional <code>total_count<\/code> field, but even there we&#8217;d have a price to pay. We can no longer use <code>AtomicXXX::fetch_add()<\/code> and we&#8217;d have to use <code>compare_exchange_weak()<\/code> in a loop, after fetching the initial value, separating it into its component fields, incrementing\/decrementing, then combining the fields into a single word again &#8211; although a quick godbolt attempt shows the compiler <a href=\"https:\/\/godbolt.org\/z\/4zba6js8j\" rel=\"follow\">actually does a rather nice job<\/a>.&nbsp;<a href=\"#rf15-4958\" class=\"backlink\" title=\"Jump back to footnote 15 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn16-4958\"><p>C# is a great example here, with extensive input and sanity checking for most APIs but almost all of it in the form of runtime exceptions &#8211; despite it being a strongly typed language with powerful and extensible compiler integration.&nbsp;<a href=\"#rf16-4958\" class=\"backlink\" title=\"Jump back to footnote 16 in the text.\">&#8617;<\/a><\/p><\/li><li id=\"fn17-4958\"><p>It currently still uses atomic unsigned integers in the implementation and so does not implement the wait-free, eventually consistent API to artificially reduce the currently available concurrency without making a <code>wait()<\/code> call,\u00a0<em>waiting for it to return<\/em>, and then forgetting the result. At the time of its initial development, I had started off with signed integers then realized I didn&#8217;t need them for the core functionality and switched to unsigned atomic values instead. I may need to revisit that decision in another release if it can give us either a wait-free <code>reduce()<\/code> corollary to <code>release()<\/code> instead of the <code>Semaphore::wait().forget()<\/code> or <a href=\"https:\/\/docs.rs\/rsevents-extra\/0.2.0\/rsevents_extra\/struct.Semaphore.html#method.modify\" rel=\"nofollow\">the current <code>modify()<\/code> method<\/a> which allows wait-free direct manipulation of the internal count, but only with an <code>&amp;mut Semaphore<\/code> reference (to guarantee that it isn&#8217;t in use, eschewing eventual consistency for correctness), but feedback on whether a wait-free <code>reduce()<\/code>\u00a0<em>at the cost of eventual consistency<\/em> is a win or a draw\/loss would be appreciated from anyone nerdy enough to read these footnotes!&nbsp;<a href=\"#rf17-4958\" class=\"backlink\" title=\"Jump back to footnote 17 in the text.\">&#8617;<\/a><\/p><\/li><\/ol>","protected":false},"excerpt":{"rendered":"<p>Discuss this article on r\/rust or on Hacker News. Low-level or systems programming languages generally strive to provide libraries and interfaces that enable developers, boost productivity, enhance safety, provide resistance to misuse, and more &#8212; all while trying to reduce &hellip; <a href=\"https:\/\/neosmart.net\/blog\/implementing-truly-safe-semaphores-in-rust\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":505,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[999],"tags":[1024,1023,52,1022,936,1025],"class_list":["post-4958","post","type-post","status-publish","format-standard","hentry","category-programming","tag-concurrency","tag-multithreading","tag-open-source","tag-release","tag-rust","tag-semaphore"],"aioseo_notices":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/p4xDa-1hY","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/posts\/4958","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/users\/505"}],"replies":[{"embeddable":true,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/comments?post=4958"}],"version-history":[{"count":38,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/posts\/4958\/revisions"}],"predecessor-version":[{"id":4998,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/posts\/4958\/revisions\/4998"}],"wp:attachment":[{"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/media?parent=4958"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/categories?post=4958"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/tags?post=4958"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}