The Journey To a Faster Sandpack Transpiler

The Starting Point

In the history of Sandpack, we have always been faced with a big performance challenge, both in terms of runtime as well as bundle size. This has set us on a path towards squeezing every little bit of performance out of our transpilers, starting with babel, and later on moving to meriyah with a bunch of custom logic on top.

Recently, we announced Nodebox in our Sandpack 2.0 release, which came with an even larger performance challenge—now we’re transpiling a lot more code on the client as we need to ensure our node_modules are nearly identical to a local install. This set us on a path to finding/building an even faster transpiler.

Originally, we looked for an existing solution that would fit our needs and we ended up with Sucrase, a Babel fork that is significantly faster than Babel and also considerably smaller than both esbuild and SWC.

Sucrase was great but had trouble processing larger files. A good example/benchmark is transpiling tsserver of the TypeScript package. So we ended up optimizing and even rewriting Sucrase to be a lot faster…

Benchmarking: lodash (all js files in the lodash node_modules folder)
Testing against 132.153 LOC
Sandpack    0.12 seconds    1072139 lines per second
Sucrase     0.15 seconds    876411 lines per second
swc         0.68 seconds    193490 lines per second
esbuild     0.81 seconds    163682 lines per second
TypeScript  4.07 seconds    32465 lines per second
Babel       3.37 seconds    39181 lines per second
 
Benchmark: TypeScript (all js files in the typescript node_modules folder)
Testing against 2.877.336 LOC
Sandpack    4.56 seconds    631048 lines per second
Sucrase     8.25 seconds    348763 lines per second
swc         15.63 seconds   184120 lines per second
esbuild     7.29 seconds    394802 lines per second
TypeScript  143.26 seconds  20085 lines per second
Babel       77.93 seconds   36924 lines per second

Optimising Sucrase

As we needed fast performance (actually, near-instant transforms of pretty much everything we needed to resolve this), we profiled Sucrase and found that they used classes in the parser which added an immense amount of overhead.

Luckily, those classes don’t really have any logic so refactoring them to a simple object was an easy win, causing the parser to become a lot faster—up to 10x faster compared to before the refactor.

This made Sucrase faster than the fastest transformer (esbuild) for smaller files and on par with esbuild for large files like tsserver.

After this simple fix, we hit a bit of a roadblock. We couldn’t find any more low-hanging fruit and feared we’d have to do some serious performance hacking to get any significant improvements after this.

Instead of chasing these small wins, we thought—why not rewrite this in Rust so we have more control over everything? Especially memory management, as we also noticed Sucrase was stressing out the garbage collector sometimes resulting in inconsistent results (peaks of 5x slower transforms than the fastest transform, when combined with sandpack-bundler).

These issues became even more apparent given that sandpack-bundler uses a lot of memory already which made Sucrase even slower.

So let’s just get rid of the garbage collector right? That way, we can rely on our lab benchmarks being a lot closer to the real-world scenario and giving us full control over every single byte we allocate.

The Journey Into Rust

Rewriting the Parser

Our first goal was to get the parser running without any issues in Rust.

For this, we copied the original TS code into a Rust file and started rewriting line per line into its Rust equivalent. This mostly went pretty well. We ended up debugging a couple of minor issues, but generally, this was done relatively quickly and easily.

Now the kicker is that this made the parser significantly slower (took like 6 minutes to parse tsserver 🤯).

How is this even possible? Isn’t Rust supposed to be the fast language? Well, not if you do something that traverses a 6.5 million character file for every single character. Admire this really really slow piece of code:

fn get_char(&self, i: usize) -> char {
	self.input.chars().nth(i).unwrap()
}

For the non-rustaceans, let me walk you through how this works.

Strings in Rust are stored on a vector of bytes (not characters, like JavaScript does) and we’re converting this code from JS, so all our logic will still rely on the character matching… We weren’t planning on a ground-up rewrite.

So every time we called this util to get a character it would walk over the byte array and collect the characters as it goes. This is quite a slow operation once you start doing this for millions of characters.

There is this amazing package available for these kinds of use cases called Ropey, so we gave it a try.

After introducing Ropey, we reduced our times from 6 minutes to about 150 ms—that’s a huge difference! Ropey takes slices of the string and stores them in some kind of tree structure, with cached character lengths, so it only does the computations on the slice that contains the character. This is still not perfect, but already a lot faster.

Wanna make it even faster? Let’s do it!

So what if we instead of Ropey, we made it do the same thing JS does and use a vector of characters?

This resulted in this piece of code:

let chars: Vec<char> = input.chars().collect();
 
fn get_char(&self, i: usize) -> char {
	self.chars[i]
}

This got our code running at around 50 ms. That’s a huge improvement and about 2x faster than the JS version! We did it 🥳

Rewriting The Transformers

Unfortunately, we also need transformers. So let’s get cracking on that as well!

We’re gonna take the same approach that worked for the parser—copying over the TypeScript code file by file and translating it into Rust. It mostly went pretty well until we actually tested it…

After testing, we encountered a little issue with our performance journey in the parser. Given that we have a vector of chars in the parser, we can’t really re-use that for the string operations we want to do during transforming.

During transforming, we use utilities like this:

pub fn identifier_name_for_token(&mut, token: &Token) -> String {
    self.parser.tokenizer.get_code_slice(token.start..token.end)
}
 
pub fn get_code_slice(&mut, range: std::ops::Range<usize>) -> String {
	String::from_iter(&self.chars[range])
}

This means we had to come up with a different way to do it. The first thing that comes to mind is bringing back Ropey and using that as a source to take string slices from, but it seemed really slow. As we needed to convert every single string we wanted to insert into a rope, it was actually not faster than creating a new String from each character range we needed.

After some thinking, we came up with the idea of using a stateful character iterator that could go back and forward through the byte vector.

#[derive(Clone, Debug, PartialEq)]
pub struct CharIterator<'a> {
    value: &'a str,
    len: usize,
    curr_char_idx: usize,
    curr_byte_idx: usize,
}
 
impl<'a> CharIterator<'a> {
    pub fn from_string(value: &'a str) -> CharIterator {
        let len = count_chars(value.as_bytes());
        CharIterator {
            value,
            len,
            curr_char_idx: 0,
            curr_byte_idx: 0,
        }
    }
 
    pub fn len(&self) -> usize {
        self.len
    }
 
    fn decode_char(&mut self) -> u32 {
        let bytes = self.value.as_bytes();
 
        // Decode UTF-8
        let x = bytes[self.curr_byte_idx];
        if x < 128 {
            return x as u32;
        }
 
        // Multibyte case follows
        // Decode from a byte combination out of: [[[x y] z] w]
        // NOTE: Performance is sensitive to the exact formulation here
        let init = utf8_first_byte(x, 2);
        // SAFETY: `bytes` produces an UTF-8-like string,
        // so the iterator must produce a value here.
        let y = bytes[self.curr_byte_idx + 1];
        let mut ch = utf8_acc_cont_byte(init, y);
        if x >= 0xE0 {
            // [[x y z] w] case
            // 5th bit in 0xE0 .. 0xEF is always clear, so `init` is still valid
            // SAFETY: `bytes` produces an UTF-8-like string,
            // so the iterator must produce a value here.
            let z = bytes[self.curr_byte_idx + 2];
            let y_z = utf8_acc_cont_byte((y & CONT_MASK) as u32, z);
            ch = init << 12 | y_z;
            if x >= 0xF0 {
                // [x y z w] case
                // use only the lower 3 bits of `init`
                // SAFETY: `bytes` produces an UTF-8-like string,
                // so the iterator must produce a value here.
                let w = bytes[self.curr_byte_idx + 3];
                ch = (init & 7) << 18 | utf8_acc_cont_byte(y_z, w);
            }
        }
 
        ch
    }
 
    fn get_byte_idx(&mut self, char_idx: usize) -> usize {
        // If character is out of bound return the byte length
        if char_idx == self.len {
            return self.value.len();
        } else if char_idx > self.len {
            panic!(
                "Character out of bound, length is {} and requested char idx {}",
                self.len, char_idx
            );
        }
 
        let bytes = self.value.as_bytes();
        if self.curr_char_idx < char_idx {
            // Next
            while self.curr_char_idx < char_idx {
                self.curr_byte_idx += 1;
                let byte = bytes[self.curr_byte_idx];
                if is_utf8_char_boundary(byte) {
                    self.curr_char_idx += 1;
                }
            }
        } else if self.curr_char_idx > char_idx {
            // Previous
            while self.curr_char_idx > char_idx {
                self.curr_byte_idx -= 1;
                if self.curr_byte_idx < bytes.len() {
                    let byte = bytes[self.curr_byte_idx];
                    if is_utf8_char_boundary(byte) {
                        self.curr_char_idx -= 1;
                    }
                }
            }
        }
 
        self.curr_byte_idx
    }
 
    pub fn get_char(&mut self, char_idx: usize) -> char {
        // If character is out of bound return a space
        if char_idx == self.len {
            return ' ';
        }
 
        self.get_byte_idx(char_idx);
 
        let char_value: u32 = self.decode_char();
        let ch: char = unsafe { char::from_u32_unchecked(char_value) };
        ch
    }
}

It’s not the safest or cleanest solution, but it is fast!

Unfortunately, it’s slightly slower than our original character vector solution, but a lot faster than Ropey. And given that the character iterator is backed by a native String, we can take string slices from it, which means zero-copy comparing.

Something like this:

pub fn slice(&mut self, range: std::ops::Range<usize>) -> &'a str {
    let start_byte_idx = self.get_byte_idx(range.start);
    let end_byte_idx = self.get_byte_idx(range.end);
    &self.value[start_byte_idx..end_byte_idx])
}

Now there is just one more thing that is slow: we sometimes store/create new strings, which still get cloned a bit too much. For that, we came up with a way to store strings and get a string reference with the correct lifetime back.

use std::{collections::HashSet, mem::transmute, rc::Rc};
 
#[derive(Clone, Debug, Hash, Eq, PartialEq)]
enum CachedItem<'a> {
    Ref(&'a str),
    Value(Rc<String>),
}
 
impl<'a> CachedItem<'a> {
    pub fn to_str(&self) -> &'a str {
        match self {
            CachedItem::Ref(val) => val,
						// This is totally safe, Rust just doesn't really support lifetimes on String values
            CachedItem::Value(val) => unsafe { transmute::<&str, &'a str>(val.as_str()) },
        }
    }
}
 
#[derive(Clone, Debug, PartialEq)]
pub struct StaticStringStorage<'a> {
    storage: HashSet<CachedItem<'a>>,
}
 
impl<'a> StaticStringStorage<'a> {
    pub fn new() -> StaticStringStorage<'a> {
        StaticStringStorage {
            storage: HashSet::with_capacity(10),
        }
    }
 
    pub fn add_string(&mut self, value: Rc<String>) -> &'a str {
        let cached: CachedItem<'a> = CachedItem::Value(value);
        let found_value = self.storage.get(&cached);
        if let Some(existing_val) = found_value {
            existing_val.to_str()
        } else {
            self.storage.insert(cached.clone());
            cached.to_str()
        }
    }
}

We use those string slices in things like our name manager, to ensure we never have a duplicate string value in memory. It also again simplifies our code as we now always work with string references and no more actual String values.

use std::{collections::HashSet, rc::Rc};
 
use super::string_storage::StaticStringStorage;
 
#[derive(Clone, Debug, PartialEq)]
pub struct NameManager<'a> {
    pub used_names: HashSet<&'a str>,
    pub str_storage: StaticStringStorage<'a>,
}
 
impl<'a> NameManager<'a> {
    pub fn new() -> NameManager<'a> {
        NameManager {
            used_names: HashSet::with_capacity(100),
            str_storage: StaticStringStorage::new(),
        }
    }
 
    pub fn add_name(&mut self, name: &'a str) {
        self.used_names.insert(name);
    }
 
    pub fn add_new_name(&mut self, name: String) -> &'a str {
        let str = self.str_storage.add_string(Rc::new(name));
        self.used_names.insert(str);
        return str;
    }
 
    pub fn find_free_name(&mut self, name: &str) -> &'a str {
        if !self.used_names.contains(name) {
            return self.add_new_name(String::from(name));
        }
 
        let mut suffix_num = 2;
        let mut result = format!("{}{}", name, suffix_num);
        while self.used_names.contains(result.as_str()) {
            result = format!("{}{}", name, suffix_num);
            suffix_num += 1;
        }
        self.add_new_name(result)
    }
}

Now that we’ve optimized and written all of this, it’s time to test it properly and ship it!

After testing on tsc, it appears to be 2x faster than our previously optimized Sucrase in a simple environment—and probably more in a memory-intense environment like sandpack-bundler and nodebox, but that is hard to measure unfortunately.

This is Only The Beginning

It might seem like this is the end of it, but now that everything is in Rust and has a proper test suite, we can experiment with more performance improvements.

And we definitely will! There’s a ton we can try, like:

Migrate our parser to a byte-based approach, removing the need for all character-based logic, likely speeding up our code a lot
Optimizing certain code paths to reduce the number of times it goes into the character iterator
Reducing our last couple utils that still make new String

But for now, we’ll leave it at this stage and see if we need any additional performance gains.

If you find this interesting and want to explore Sandpack and Nodebox further, check out the Sandpack and Nodebox repositories on GitHub.