Unicode.net: the Unicode (and emoji) library for .NET platforms

We are proud to present the latest addition to our open source portfolio, the Unicode.net library! We’ve extracted a number of encoding- and emoji-related namespaces and functions from a few of our projects going back many years and split them off to create Unicode.net: an open source library that can be used to aid in the safe processing and manipulation of (possibly) internationalized strings and non-ASCII characters (and then some).

Unicode.net is designed from the ground-up as a modern approach to text processing and text encoding, with only support for the most popular Unicode encodings: UTF-8, UTF-16, and UTF-32. Additionally, Unicode.net is designed to complement .NET’s existing (albeit extremely limited) Unicode support, instead of supplanting it, which primarily translates to embracing rather than shunning the System.String type wherever possible. Unlike many other text-processing libraries, Unicode.net does not want you to stop using the system types for string representation and to switch over to custom datatypes 😁.

That emoji? It’s not just there because we were smiling when we wrote that. It’s there because Unicode.net is also a first-class emoji library for .NET! The fact is that the sudden rush to embrace Unicode even in legacy codebases stems from the fact that finally users of said software even in English-speaking countries are eager to support non-ASCII text in their application/environment. Emoji has taken over the world, and as far as Unicode is concerned, that’s absolutely great news. As such, Unicode.net recognizes that Unicode support in applications is no longer just a matter of text processing, but one of glyph processing, where text is just one (very popular) form of glyph.

Unicode.net includes an emoji library based off of the Unicode Consortium’s UTR #51 spec, including a class containing all valid emoji in the latest UTR as of the time of this publication, and helper methods and classes to aid in processing, parsing, generating, and selecting emoji in your newly-minted, Unicode-aware applications. Huzzah! 🎉

Unicode.net is, of course, released as open source under the terms of the MIT license on GitHub. We are kindly requesting that you help us make this a first-class Unicode library for the .NET Platform, and invite you to contribute in any way you can. We are looking for contributors with unit testing experience to help us expand our current MSTest collection, and for developers with the time to spare to help extend the Unicode.Languages class to include support for more languages.

In an effort to encourage adoption of correct text-processing approaches in multilingual environments, Unicode.net is designed to minimize friction wherever possible. That’s why Unicode.net is now available on NuGet, supporting all .NET targets – from .NET Framework 2.0 onward, including .NET Core for web/ASP.NET Core-based Unicode goodness and even UWP to get internationalization and emoji in Windows 10 metro applications, Windows 10 Mobile apps, and even on the XBox.

Unicode.net provides a direct (and easy) mapping between a .NET string object and a UnicodeSequence, which is just a shallow wrapper over an IEnumerable<Codepoint>, where each Codepoint represents (unsurprisingly) a single Unicode codepoint. Unicode.net provides for a two-way mapping between Unicode codepoints and UTF-8, UTF-16, and UTF-32 encodings, and adds extension methods to the system string class to easily decompose a string into its letters (instead of bytes/UTF-16 codepoints) via the string.Codepoints() and string.Letters() extension methods.

For emoji lovers, Unicode.net also brings to the table a list of all emoji as well as a few curated lists of emoji to help you navigate your way through the ever-growing list of emoji that the Unicode Consortium (thankfully) continues to churn out. Unicode.Emoji.Basic provides a more-minimal list of emoji suitable for listing in an emoji picker control, removing duplicates and emoji not supported by the system font (those that have no glyphs or use multiple glyphs to draw one emoji). All emoji are indexed and sorted per the recommendations of the Unicode Consortium, to make that much easier to get going. Each emoji also has a list of keywords attached to it, to make searching for a single emoji among thousands a more manageable chore.

Go ahead, give it a gander and let us know what you think. We are eager to incorporate community feedback to help take this library to the next level. The ultimate goal is to make the processing and manipulation of internationalized text a proverbial piece of cake for .NET developers everywhere. To be perfectly honest, we found it quite surprising that no such undertaking has surfaced in the last decade or so, and hope to plug that gap with Unicode.net. This is our gift to the .NET community and a thank you of sorts to a platform that has made a decade-plus of development a lot easier and less painful than it would have been otherwise. Enjoy!

Follow us on Twitter and GitHub to get more (free!) .NET and open source awesomeness!

Leave a Reply

Your email address will not be published. Required fields are marked *