Scripting in rust with self-interpreting source code

I have a soft spot in my heart for rust and a passionate distrust (that has slowly turned into hatred) for interpreted, loosely typed languages, but it’s hard to deny the convenience of being able to bang out a bash script you can literally just write and run without having to deal with the edit-compile-run loop, let alone create a new project, worry about whether or not you’re going to check it into version control, and everything else that somehow tends to go hand-in-hand with modern strongly typed languages.

A nifty but scarcely known rust feature is that the language parser will ignore a shebang at the start of the source code file, meaning you can install an interpreter that will compile and run your rust code when you execute the .rs file – without losing the ability to compile it normally. cargo-script is one such interpreter, meaning you can cargo install cargo-script then execute your source code (after making it executable, :! chmod +x %) with something like this:

#!/usr/bin/env cargo-script

fn main() {
    println!("Hello, world!");
}

That’s pretty cool. But it’s bogged down by the inertia of an external dependency (even if it’s on crates.io) , and more importantly, needing to install an interpreter just isn’t true to the hacker spirit. Fortunately, we can do better: it’s possible to write code that is simultaneously a valid (cross-platform!) shell script and valid rust code at the same time, which we can abuse to make the code run itself!

Rust already treats a line starting with #!/ as a comment, meaning we don’t have to worry about how we’re going to prevent the shebang from preventing our code from being a valid, conformant rust file. But how do we inject a shell-scripted “interpreter” into the source code afterwards? Fortunately/unfortunately # is not a comment in rust and // is not a comment in sh, so a comment in either language to get it to ignore a line while the other interprets it will work… but will also cause the other to complain about invalid syntax.

The trick is that we can abuse the rustc preprocessor by using a no-op crate attribute at the start of the file to get an sh comment that is still valid rust code and the rest, as they say, is history:

#!/bin/sh
#![allow()] /*
# rust self-compiler by M. Al-Qudsi, licensed as public domain or MIT.
# See <https://neosmart.net/blog/self-compiling-rust-code/> for info & updates.
OUT=/tmp/$(printf "%s" $(realpath $(which "$0")) | md5sum | cut -d' '  -f1)
MD5=$(md5sum "$0" | cut -d' '  -f1)
(test "${MD5}" = "$(cat "${OUT}.md5" 2>/dev/null)" ||
(grep -Eq '^\s*(\[.*?\])*\s*fn\s*main\b*' "$0" && (rm -f ${OUT};
rustc "$0" -o ${OUT} && printf "%s" ${MD5} > ${OUT}.md5) || (rm -f ${OUT};
printf "fn main() {//%s\n}" "$(cat $0)" | rustc - -o ${OUT} &&
printf "%s" ${MD5} > ${OUT}.md5))) && exec ${OUT} $@ || exit $? #*/

// Wrapping your code in `fn main() { … }` is altogether optional :)
fn main() {
    let name = std::env::args().skip(1).next().unwrap_or("world".into());
    println!("Hello, {}!", name);
}

The program above is simultaneously a valid rust program and a valid shell script that should run on most *nix platforms.¹ You can either compile it normally with rustc:

$ rustc ./script.rs -o script
$ ./script
Hello, world!

Or execute the source code itself directly:

$ chmod +x ./script.rs
$ ./script.rs friend
Hello, friend!

The self-compiling header actually does a bit more than just compile the rust source code and run the result:

It caches the compiled output and will execute the previously built binary if it exists and the source file has not been changed, meaning subsequent runs are pretty much instantaneous;
the output does not clutter your working directory;
it supports running traditional rust programs with an fn main() entry-point;
it also supports a special “quick-and-dirty” mode where the contents of your script will be wrapped in fn main() { ... } for you, letting you skip even that boilerplate.

The self-compiling/self-interpreting header above has been optimized for size, absolutely at the cost of legibility. But fear not, here’s a line-by-line annotated equivalent to explain what is going on:

#!/bin/sh
#![allow()] /*

# rust shebang by Mahmoud Al-Qudsi, Copyright NeoSmart Technologies 2020-2021
# See <https://neosmart.net/blog/self-compiling-rust-code/> for info & updates.
#
# This code is freely released to the public domain. In case a public domain
# license is insufficient for your legal department, this code is also licensed
# under the MIT license.

# Get an output path that is derived from the complete path to this self script.
# - `realpath` makes sure if you have two separate `script.rs` files in two
#   different directories, they get mapped to different binaries.
# - `which` makes that work even if you store this script in $PATH and execute
#   it by its filename alone.
# - `cut` is used to print only the hash and not the filename, which `md5sum`
#   always includes in its output.
OUT=/tmp/$(printf "%s" $(realpath $(which "$0")) | md5sum | cut -d' '  -f1)

# Calculate hash of the current contents of the script, so we can avoid
# recompiling if it hasn't changed.
MD5=$(md5sum "$0" | cut -d' '  -f1)

# Check if we have a previously compiled output for this exact source code.
if !(test -f "${OUT}.md5" && test "${MD5}" = "$(cat "${OUT}.md5")"); then
	# The script has been modified or is otherwise not cached.
	# Check if the script already contains an `fn main()` entry point.
	if grep -Eq '^\s*(\[.*?\])*\s*fn\s*main\b*' "$0"; then
		# Compile the input script as-is to the previously determined location.
		rustc "$0" -o ${OUT}
		# Save rustc's exit code so we can compare against it later.
		RUSTC_STATUS=$?
	else
		# The script does not contain an `fn main()` entry point, so add one.
		# We don't use `printf 'fn main() { %s }' because the shebang must
		# come at the beginning of the line, and we don't use `tail` to skip
		# it because that would result in incorrect line numbers in any errors
		# reported by rustc, instead we just comment out the shebang but leave
		# it on the same line as `fn main() {`.
		printf "fn main() {//%s\n}" "$(cat $0)" | rustc - -o ${OUT}
		# Save rustc's exit code so we can compare against it later.
		RUSTC_STATUS=$?
	fi

	# Check if we compiled the script OK, or exit bubbling up the return code.
	if test "${RUSTC_STATUS}" -ne 0; then
		exit ${RUSTC_STATUS}
	fi

	# Save the MD5 of the current version of the script so we can compare
	# against it next time.
	printf "%s" ${MD5} > ${OUT}.md5
fi

# Execute the compiled output. This also ends execution of the shell script,
# as it actually replaces its process with ours; see exec(3) for more on this.
exec ${OUT} $@

# At this point, it's OK to write raw rust code as the shell interpreter
# never gets this far. But we're actually still in the rust comment we opened
# on line 2, so close that: */


fn main() {
    let name = std::env::args().skip(1).next().unwrap_or("world".into());
    println!("Hello, {}!", name);
}
 

// vim: ft=rust

If you would like to receive a notification the next time we release a rust library, publish a crate, or post some rust-related developer articles, you can subscribe below. Note that you'll only get notifications relevant to rust programming and development by NeoSmart Technologies. If you want to receive email updates for all NeoSmart Technologies posts and releases, please sign up in the sidebar to the right instead.

If you don’t have md5sum you can use openssl md5 | cut -d' ' -f2 instead, the other runtime dependencies are which, cat, realpath, cut, test, and grep which are almost universally available. ↩

9 thoughts on “Scripting in rust with self-interpreting source code”

Fabian on March 16, 2020 at 6:35 pm said:

Cool snippet!

`test` was upset for me untiI I put quotes around ${MD5}

Also, if you use /bin/bash instead you can pass on $0 to the output via the `-a` flag to exec.
Mahmoud Al-Qudsi on March 17, 2020 at 6:41 pm said:

@fabian: Ah, yes. I forgot about the arcane quoting requirements of some flavors of test (it’s almost always a built-in and some shells abuse that to give it extra privileges). Thanks for the correction!

I’ve updated the post with more quoting, and I’ve also updated the samples to include support for passing through provided arguments. You’re right about exec -a being a bash-only feature; I wasn’t able to find an option to mimic its behavior with dash.

(Sidenote: I’ve learned to always prefer /usr/bin/env bash over /bin/bash as the latter may fail on some non-GNU platforms even when bash is installed (e.g. FreeBSD).
Slava on May 12, 2020 at 2:47 am said:

What about putting this whole script into a separate file and just using “!/usr/bin/env rust-script” for the shebang in your *.rs files, to avoid copying and pasting all that boilerplate in every script file?

I tried doing that, it almost worked: I had to replace “$0” with “$1” though… and I’m pretty sure it’s not a very good idea, because “$0” is not just a zeroth argument – it contains a complete file name, even if the file name has whitespace (so “$1” probably will not capture correct file names in such cases).

And there’s something to be done with $@ in the line with “exec” then – to exclude the “$1” from it. I think I saw examples where they literally did “$0 $2 $3 $4 $5…”

Any sh-wizards around? 🙂

Anyway, thank you for the script, it works great otherwise!
James on May 24, 2020 at 12:14 pm said:

You should redirect stderr to /dev/null on the `cat` at line 7 to suppress warnings about md5 sums that don’t exist yet.

@Slava:
Call the bash command `shift` to move the arguments one to the left, dropping `$1`. Also, just ensure that “$1” is quoted as you pass the path to the script, and if you call it on a rust source file with spaces in the path from the terminal, quote it there, too. Quoting in a shell (and a shell script) will enable passing the whitespace-containing bits as a single argument, without splitting on IFS.
Slava on June 17, 2020 at 9:34 am said:

Thank you @James! That worked great!
Mahmoud Al-Qudsi on February 18, 2021 at 11:00 am said:
@James that was actually caused by a different mistake on my part: when minifying the script by hand, I wrote it assuming test foo && test bar was equivalent to test foo -a bar but I missed that it was also invoking shell substitution which would take place (and possibly complain) prior to the actual execution of test.

However, for minification purposes it is indeed better to beg forgiveness than ask permission and
```
test "${MD5}" = "$(cat "${OUT}.md5" 2>/dev/null)"
```
is shorter than
```
test -f "${OUT}.md5" && test "${MD5}" = "$(cat "${OUT}.md5")"
```
(although not by much)

The minified script has been updating accordingly.
Christof Petig on February 20, 2021 at 8:49 am said:

Hi, I like the idea!
Why not make the sha sum part of the temporary file name instead of creating a companion file which needs to be verified?
Keep up the good work, Christof
Mahmoud Al-Qudsi on February 22, 2021 at 9:47 pm said:

@Christof Thanks! The reason for the separate MD5 file is cleanup: if the MD5 is part of the file name, we can only ever know the path to the compiled output of the current version of the code, and can never delete/overwrite existing old versions, which will add up over time. The way it’s done now, the path to both the compiled executable and the file containing the MD5 does not change with changes to the source code, so you’ll never end up with a mountain of unused executables filling up your /tmp.
avi on April 19, 2023 at 6:39 pm said:

I get the following on FreeBSD, maybe this relies on gnu grep?

`grep: repetition-operator operand invalid`