{"id":4552,"date":"2019-06-07T00:14:57","date_gmt":"2019-06-07T05:14:57","guid":{"rendered":"http:\/\/neosmart.net\/blog\/?p=4552"},"modified":"2019-07-01T16:30:03","modified_gmt":"2019-07-01T21:30:03","slug":"sorted-list-vs-binary-search-tree","status":"publish","type":"post","link":"https:\/\/neosmart.net\/blog\/sorted-list-vs-binary-search-tree\/","title":{"rendered":"Faster lookups, insertions, and in-order traversal than a red-black or AVL tree"},"content":{"rendered":"<p><a href=\"https:\/\/neosmart.net\/blog\/wp-content\/uploads\/2019\/06\/dot-NET-banner-square.png\" rel=\"follow\"><img loading=\"lazy\" decoding=\"async\" class=\"alignright wp-image-4558 colorbox-4552\" style=\"float: right;\" src=\"https:\/\/neosmart.net\/blog\/wp-content\/uploads\/2019\/06\/dot-NET-banner-square.png\" alt=\"\" width=\"128\" height=\"128\" srcset=\"https:\/\/neosmart.net\/blog\/wp-content\/uploads\/2019\/06\/dot-NET-banner-square.png 209w, https:\/\/neosmart.net\/blog\/wp-content\/uploads\/2019\/06\/dot-NET-banner-square-150x150.png 150w\" sizes=\"auto, (max-width: 128px) 100vw, 128px\" \/><\/a><em>NeoSmart Technologies is pleased to announce the immediate availability of its open source <a href=\"https:\/\/github.com\/neosmart\/collections\" rel=\"nofollow\"><code>NeoSmart.Collections<\/code><\/a> library\/package of specialized containers and collections designed to eke out performance beyond that which is available in typical libraries and frameworks, by purposely optimizing for specific (common!) use cases at the cost of pretty much everything else.<\/em><\/p>\n<p>In many regards, the data structures\/containers\/collections that ship with pretty much any given framework or standard library are a study in compromise. Language and framework developers have no idea what sort of data users will throw at their collections, or in what order. They have no clue whether or not a malicious third party would ultimately be at liberty to insert carefully crafted values into a container, or even what the general ratio of reads to writes would look like. The shipped code must be resilient, fairly performant, free of any obvious pathological cases with catastrophic memory or computation complexities, and above all, dependable.<\/p>\n<p>It isn\u2019t just language and framework developers that are forced to make choices that don\u2019t necessarily align with your own use case. Even when attempting to identify what alternative data structure you could write up and use for your needs, you\u2019ll often be presented with <em>theoretical<\/em> <span class=\"math inline\">\ud835\udcaa<\/span> numbers that don\u2019t necessarily apply or even have any relevance at all in the real world. An algorithm thrown out the window for having a horrible <span class=\"math inline\">\ud835\udcaa<\/span> in <em>Algorithms and Data Structures 101<\/em> may very well be your best friend if you can reasonably confine it to a certain subset of conditions or input values \u2013 and that\u2019s without delving into processor instruction pipelines, execution units, spatial and temporal data locality, branch prediction, or SIMD processing.<\/p>\n<p><!--more--><\/p>\n<p>Which brings us to <code>NeoSmart.Collections.SortedList&lt;T&gt;<\/code>, the star of today\u2019s <code>NeoSmart.Collections<\/code> nuget package release. Most developers have learned to reach for their libraries equivalent of <code>std::set<\/code> when they need to keep an ordered list of values around for deduplication and querying. Whether it\u2019s <code>std::set&lt;T&gt;<\/code> or <code>System.Collections.SortedSet&lt;T&gt;<\/code>, the implementation is typically a red-and-black tree (a type of self-balancing binary search tree) that can guarantee <span class=\"math inline\">\ud835\udcaa(log\u2006<em>n<\/em>)<\/span> lookups without <em>too<\/em> much memory overhead or insert-time computation.<\/p>\n<p>But that <span class=\"math inline\">\ud835\udcaa(log\u2006<em>n<\/em>)<\/span> is extremely misleading. A binary search tree naturally implies a certain amount of memory fragmentation, and a self-balancing binary-search tree automatically incurs a fair bit of work on insertion to guarantee the lookup times. Data isn\u2019t contiguously allocated, and most tree implementations are not particularly cache friendly. Balancing itself also means you\u2019ll sometimes pull in data you don\u2019t want or need into the cache, evicting data you might be about to read next.<\/p>\n<p>At first blush, all that sounds downright pleasurable compared to what it costs to store always-sorted-at-rest data in a <code>T[]<\/code> or <code>List&lt;T&gt;<\/code>, needing to shift, on average, <span class=\"math inline\"><em>n<\/em>\/2<\/span> elements on each insert to place items in their correct location and possibly needing to copy the entire contents of the list or array over to a new memory location if the capacity is exceeded and a larger backing array is needed. But it\u2019s also fairly easy to see how, once populated, a sorted <code>T[]<\/code> array offers more preferable real-world lookup times with better data locality and cache friendliness as compared to your average binary search tree, while still technically making the same (theoretical) <span class=\"math inline\">\ud835\udcaa(log\u2006<em>n<\/em>)<\/span> guarantee.<\/p>\n<p>But that\u2019s on paper in a world of unknowns where anything can happen. In the here and now, we have data we need to load, data we need to insert, and often, a pretty good idea of what (dis)order that data will arrive in. We generally know whether we\u2019re more likely to read or write, and maybe even where those writes will occur. While a binary tree variant may offer better insertion times if you\u2019re dealing with a constant stream of new elements in no particular order, in the real world we often have a chunk of <em>n<\/em> elements to insert all at once or elements arriving in-order \u2013 or even just reasonably close to it! \u2013 past a certain point.<\/p>\n<p>Here\u2019s a scenario that\u2019s certainly familiar to anyone that\u2019s developed an application with mutable state loaded\/restored from disk: you need an always-sorted collection of <code>T<\/code>, once your initial data is loaded from whatever input source, new data <em>generally<\/em> arrives in-order except that sometimes events\/items arriving in close temporal proximity to one another can reach out-of-order due to a queuing mechanism that doesn\u2019t guarantee FIFO semantics. Maybe your event queue doesn\u2019t guarantee exactly-once semantics either, and the same message can be delivered twice in cases of congestion or network delays, or perhaps you may need to deduplicate data that arrives within a 5-second window \u2013 or maybe not.<\/p>\n<p>Whatever the case, you find yourself dealing with data that arrives more-or-less in order. Not only can a sorted list match the insertion performance of an RB or AVL tree if the value to be inserted is strictly greater than (or equal to, if that\u2019s what you want) the previous maximum value in the collection, it can even guarantee constant-time <span class=\"math inline\">\ud835\udcaa(1)<\/span> insertions.<sup id=\"rf1-4552\"><a href=\"#fn1-4552\" title=\"Presuming sufficient preallocation of capacity\" rel=\"footnote\">1<\/a><\/sup> And if the general monotonic trend holds, even when you <em>do<\/em> need to insert out-of-order data, you only need to shuffle around any recently arrived items that should have been inserted after the element being inserted: in other words, even when the going gets tough, the tough gets going; no more <span class=\"math inline\">\ud835\udcaa(<em>n<\/em>)<\/span> insertions here!<\/p>\n<p>It is for this reason that the .NET BCL includes <code>SortedList&lt;K,V&gt;<\/code> in addition to the red-and-black tree-backed <code>SortedDictionary&lt;K,V&gt;<\/code>: sometimes a list\/vector\/array <em>is<\/em> faster than a tree, even for sorted data. But unfortunately, there\u2019s no equivalent <code>System.Collections.SortedList&lt;T&gt;<\/code> structure to be found, not in .NET Framework 4.8 or in the upcoming .NET Core 3.0. But that&#8217;s OK, because we can write our own and even include some nice optimizations for even more specific access patterns that aren&#8217;t in <code>SortedList&lt;K,V&gt;<\/code>. There\u2019s a good chance if you\u2019ve needed a data structure optimized for this particular access pattern, you\u2019ve already written it and have been using it happily ever since. But in case you haven\u2019t, read on.<\/p>\n<p><a href=\"https:\/\/github.com\/neosmart\/collections\" rel=\"nofollow\"><code>NeoSmart.Collections<\/code><\/a> provides this missing piece, with our <code>SortedList&lt;T&gt;<\/code> and <code>UniqueSortedList&lt;T&gt;<\/code> offering <em>both<\/em> faster insertions and faster lookups than the <code>System.Collections<\/code> offerings\u2026 provided these conditions are met. <code>SortedSet<\/code> doesn\u2019t give you an option of keeping duplicates or equivalents around in the collection, but our <a href=\"https:\/\/github.com\/neosmart\/collections\/blob\/master\/docs\/SortedList.md\" rel=\"nofollow\"><code>SortedList&lt;T&gt;<\/code><\/a> can handle that without a problem. And if you <em>do<\/em> need to deduplicate your inputs, then <a href=\"https:\/\/github.com\/neosmart\/collections\/blob\/master\/docs\/UniqueSortedList.md\" rel=\"nofollow\"><code>UniqueSortedList&lt;T&gt;<\/code><\/a> is your friend, with no cost incurred for the deduplication.<\/p>\n<p><code>NeoSmart.Collections<\/code> is released open source under the MIT license, and contributions are both encouraged and welcome. Source is available <a href=\"https:\/\/github.com\/neosmart\/collections\" rel=\"nofollow\">on GitHub<\/a> and the package binaries and symbols are both available <a href=\"https:\/\/www.nuget.org\/packages\/NeoSmart.Collections\" rel=\"follow\">on nuget.org<\/a> as <code>NeoSmart.Collections<\/code>, and includes the previously mentioned <code>SortedList&lt;T&gt;<\/code> and <code>UniqueSortedList&lt;T&gt;<\/code> classes, as well as some other handy data structures that aren\u2019t your textbook go-to collections for storing collections, but tend to come in handy and perform really well in the real world where it counts.<\/p>\n<hr \/>\n<div class=\"sendy_widget\" style='margin-bottom: 0.5em;'>\n<p><em>If you would like to receive a notification the next time we release a nuget package for .NET or release resources for .NET Core and ASP.NET Core, you can subscribe below. Note that you'll only get notifications relevant to .NET programming and development by NeoSmart Technologies. If you want to receive email updates for all NeoSmart Technologies posts and releases, please sign up in the sidebar to the right instead.<\/em><\/p>\n<iframe tabIndex=-1 onfocus=\"sendy_no_focus\" src=\"https:\/\/neosmart.net\/sendy\/subscription?f=BUopX8f2VyLSOb892VIx6W4BB8V5K2ReYGLVwsfKUZLXCc892Ffz8rIgRyIGoE22cZVr&title=Join+the+dotnet+mailing+list\" style=\"height: 300px; width: 100%;\"><\/iframe>\n<\/div>\n<script type=\"text\/javascript\">function sendy_no_focus(e) { e.preventDefault(); }<\/script>\n<hr class=\"footnotes\"><ol class=\"footnotes\"><li id=\"fn1-4552\"><p><em>Presuming sufficient preallocation of capacity<\/em>&nbsp;<a href=\"#rf1-4552\" class=\"backlink\" title=\"Jump back to footnote 1 in the text.\">&#8617;<\/a><\/p><\/li><\/ol>","protected":false},"excerpt":{"rendered":"<p>NeoSmart Technologies is pleased to announce the immediate availability of its open source NeoSmart.Collections library\/package of specialized containers and collections designed to eke out performance beyond that which is available in typical libraries and frameworks, by purposely optimizing for specific &hellip; <a href=\"https:\/\/neosmart.net\/blog\/sorted-list-vs-binary-search-tree\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":4577,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[212,995,963,52,11],"class_list":["post-4552","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-software","tag-development","tag-neosmart-collections","tag-nuget","tag-open-source","tag-programming"],"aioseo_notices":[],"jetpack_featured_media_url":"https:\/\/neosmart.net\/blog\/wp-content\/uploads\/2019\/06\/dot-NET-Standard-Logo-Square.png","jetpack_shortlink":"https:\/\/wp.me\/p4xDa-1bq","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/posts\/4552","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/comments?post=4552"}],"version-history":[{"count":19,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/posts\/4552\/revisions"}],"predecessor-version":[{"id":4640,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/posts\/4552\/revisions\/4640"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/media\/4577"}],"wp:attachment":[{"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/media?parent=4552"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/categories?post=4552"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/neosmart.net\/blog\/wp-json\/wp\/v2\/tags?post=4552"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}