alxhill.dev

View on GitHub

Building a fast tar replacment in Rust

Context

Background: using the standard tar tool can be frustrating - the arguments are hard to remember and the performance is single-threaded and pretty slow by default.

At Palantir, I wrote a tool that built docker layers from files on-disk. Users had fairly large conda environments (~4GB) that needed to be copied into a docker-compatible tar.gz file at a specific path. Our initial implementation used conda-pack and took 10-15 minutes, disrupting what should be a fairly iterative flow. Doing the gzip in parallel dropped the time down to about 6 seconds for the same 4GB conda pack.

Technically, this is a sliiightly different file format than .tar.gz because it’s made up of separately gzipped chunks, but docker and all modern tar utilities are able to untar it with no complaints (the same cannot be said about parallel zips, unfortunately). Sadly, unix tar doesn’t support producing parallel gzips, and pigz, the alternative tool, isn’t widely known about or used.

Inspired by the excellent ripgrep, I figure there’s a gap in the market for a solid, parallel-first, performance-focussed version of the standard tar tool, written in Rust (of course), with ease-of-use as an additional benefit.

xkcd tar comic

Design Decisions

Like ripgrep, riptar is not intended to be a drop-in replacement for tar itself. Instead, it should be a spiritual successor with a more modern interface.

Generally, riptar should follow a “do what I mean” philosophy - it only needs a single input (a compressed file), automatically figures out which argument is which (e.g if you pass in a directory name & a compressed file, and the compressed file exists, it’ll assume you want to extract the file into the directory no matter the order of the arguments).

Dev Log

This is mostly a stream-of-consciousness log of my work on the tool for my own records and reflections.

2024-11-02

Prior to starting the log, I’ve got a simple (private) repo with the necessary dependencies/etc. Quick thoughts so far: