Automated MKV Cleanup With mkvmerge, SABnzbd & Cron

Q: Will this re-encode my video and lose quality?

No. mkvmerge is a remuxer, not a transcoder. It copies the video and chosen audio and subtitle streams into a new container untouched, so there’s zero quality loss and the operation finishes in seconds rather than the hours a re-encode would take.

Q: Will it ever leave a movie silent?

Not if you build the safety logic in. The rules that prevent it are simple: never strip the only audio track, and when no audio matches your preferred languages, keep everything and log a notice instead of dropping anything. Undetermined (und) tracks only get dropped when a properly tagged English track exists alongside them. The output is also re-probed before the original is replaced, so a broken remux can never overwrite a good file.

Q: Will it delete forced subtitles?

No. Forced subtitles are always kept regardless of language, detected through the Matroska forced_track flag or a “forced” marker in the track name. Those are the subs that translate the one line of Elvish in an otherwise-English film, and losing them genuinely breaks the movie.

Q: Does the sweep re-process files it already cleaned?

No. mkvclean.py records every handled file’s path, size, and modification time in a checkpoint file and skips matching files instantly on later runs. If Radarr upgrades a movie, the new size and mtime no longer match, so the replacement gets cleaned automatically on the next sweep.

Q: Does this work with Sonarr too, not just Radarr?

Yes. You bind the SABnzbd post-processing script to a tv category in the SABnzbd UI the same way you bind it to movies. The library sweep doesn’t know or care about Radarr or Sonarr at all - it walks the filesystem.

Q: Why does apk add mkvtoolnix fail in my Docker image?

apk is Alpine’s package manager. If your base image is Debian or Ubuntu, you need apt-get install mkvtoolnix instead. The reverse trips people up too: apt-get: not found means your base is Alpine. Always check the base image distro before picking a package manager.

Q: How do I list MKV tracks and their languages from the command line?

Run mkvmerge -i file.mkv for a quick human-readable list of track IDs, types, and languages. For scripting, use mkvmerge -J file.mkv to get machine-readable JSON you can parse safely, reading tracks[].properties.language for each track.

Every media library eventually runs into the same problem. You open a movie in Jellyfin, it starts playing in a French dub you didn’t ask for, and you spend twenty seconds digging through the audio menu while the family gives you the side-eye. Multiply that by a library full of releases stuffed with five audio dubs, a commentary track, and a dozen subtitle tracks, and you have a real problem. My library had grown to roughly 2,200 MKV files, and each one was carrying around junk it did not need.

The good news? This is automatable, and the right tool does it losslessly in seconds rather than hours. The bad news, which I learned the hard way, is that the obvious approach quietly destroys foreign films and anime. This post walks through how I built an automated MKV cleanup tool with mkvmerge, including every mistake I made before it actually worked. The whole thing is now open source: two self-contained Python scripts in the mkv-track-stripper repo on GitHub, MIT licensed, ready to drop into your own setup. You’ll get the script logic, the SABnzbd hook, the cron sweep, and the reasoning behind each safety decision. The mistakes are where the real lessons live.

💭

TL;DR: Use mkvmerge to losslessly remux MKVs and strip unwanted audio and subtitle tracks, then set it up at two points: a Python SABnzbd post-processing hook (mkv_strip_pp.py) cleans new downloads before Radarr and/or Sonarr imports them, and a Python library sweep (mkvclean.py) handles the back catalog and can run as a cron job as an unattended catch-all. The mkvclean script tracks processed files in a checkpoint file so nothing gets processed twice, every remux is verified before the original is replaced, and forced subtitles always survive. Both scripts live in the mkv-track-stripper repo.

I ran this entire pipeline on Debian 13 with MKVToolNix 92.0 & 96.0, SABnzbd 5.0.3, and Radarr 6.11, validating the output in Jellyfin 10.11.10 before letting it loose on the full library. Everything in this post is a real lesson learned, scars included.

Why mkvmerge Beats ffmpeg for Stripping Tracks

For anything video-related, most people reach for ffmpeg or Tdarr first. I did too. They’re capable tools, but for stripping tracks out of MKVs at scale, mkvmerge from the MKVToolNix suite is the better fit. The reason comes down to what each tool is built for.

mkvmerge is a lossless remuxer, not a transcoder. It copies streams as-is into a new MKV while letting you drop, reorder, or relabel tracks, with no re-encoding involved. The operation is fast and there’s zero quality loss. ffmpeg is designed around decode and encode processes. Copying a subset of tracks usually means more verbose -map expressions, and some Matroska features don’t copy 1:1 without surprises.

With mkvmerge you can keep or drop tracks by ID or by language with flags like --audio-tracks, --subtitle-tracks, and --language, and you set default and forced flags per track. MKVToolNix is dedicated to Matroska. It handles chapters, tags, editions, and segment linking cleanly. ffmpeg’s Matroska muxer is good but not tuned for every edge case, and users report occasional player quirks with ffmpeg-muxed MKVs on ordered chapters and dual-subtitle anime.

I Started With Tdarr, and It Was the Wrong Tool

Before I discovered MKVToolNix, I reached for Tdarr. It’s the tool everyone points you at: a slick web UI, a node-and-worker architecture, and a plugin library that promises hands-off library maintenance. I spent an evening standing up the server, attaching a node, and wiring a flow to drop the tracks I didn’t want.

It was the wrong tool for two reasons. First, Tdarr is built around transcoding, and its flows kept nudging me toward re-encoding the video to “process” a file. I didn’t want a new encode. I wanted the exact same video and one or two audio tracks copied into a fresh container, untouched. Running a lossy, hours-long transcode only to delete a subtitle track is the opposite of what this job needs.

Second, the whole tool was wildly oversized for the task. A server, a worker node, a database, and a huge list of plugins is a lot of moving parts to own and debug when the actual operation is “remove these track IDs.” Tdarr is genuinely good at what it’s for - bulk transcoding and health-checking a library on dedicated hardware - but for lossless track stripping it buried a one-line mkvmerge call under a stack of infrastructure I’d have to maintain forever. I tore it down and went back to a script.

To be clear, none of this is a knock on Tdarr for the job it’s built for. If you actually want to re-encode your library, say converting H.264 to H.265 to reclaim storage, Tdarr’s node-and-worker setup is the right tool. I wrote a full walkthrough for that:

Tdarr Setup Guide: Install, Configure & Transcode

Read

Beelink SER5 (Ryzen 5 5600H). A palm-sized mini PC with a 6-core/12-thread CPU that feels snappy for everyday work and homelab duties, handling Docker stacks, light VMs, and Plex without guzzling power. With NVMe plus a 2.5″ bay, Wi-Fi 6, and multi-display output, it’s a quiet, tidy upgrade for desk or media setups.

Amazon Price: Loading...

Availability: Checking...

Amazon

Contains affiliate links. I may earn a commission at no cost to you.

Inspect Before You Touch Anything

Before automating, you need to see what’s actually inside a file. Two commands do everything:

mkvmerge -i movie.mkv

This lists tracks with their IDs, types, languages, and names in human-readable form. For scripting, you want the JSON version:

mkvmerge -J movie.mkv

This is a machine-readable description you can parse safely. The fields that matter are tracks[].id, tracks[].type, tracks[].properties.language, and tracks[].properties.track_name. When a release is genuinely strange (ordered chapters, segment linking), mkvinfo movie.mkv gives even deeper details for debugging.

One thing early on that I had to learn: mkvmerge -J exits with code 1 when it has warnings about a file, but it still emits perfectly valid JSON. An early version treated that as a probe failure and skipped the file. Only exit codes of 2 or higher mean the probe actually failed.

The First Script, and the Trap Hiding in It

My first script was embarrassingly simple: keep only English audio and English subtitles, drop everything else. It looked completely reasonable. It ran fine against a folder of Hollywood blockbusters. Then I pointed it at my anime folder.

Every single file came out silent.

The script had stripped the only audio track in each file because the language tag said jpn, not eng. There was no error and no warning. Just dead silence when I started movie night while everyone stared at me. That moment is when I stopped writing naive language filters and started reasoning from what tracks are actually present, rather than what I assumed should be there.

The fix is conditional logic, and it’s conservative on purpose. Breaking one film is worse than failing to clean five.

If a file has a single audio track, keep it no matter the language. A movie with one audio stream is never a candidate for audio removal.
If a file has multiple audio tracks but none match your preferred languages, keep them all and log a notice rather than risk going silent.
Never drop all subtitle tracks when the audio is in a language you don’t understand.

There’s no reliable original-language flag to lean on. Real-world files have audio mistagged as und, fansubs with multiple eng subtitle tracks (full subs versus signs-and-songs), and scene releases that label things inconsistently.

That und (undetermined) tag deserves its own rule, because it’s a coin flip: it might be the English track a sloppy release group forgot to label, or it might be a dub you’ll never play. The selection logic in both scripts treats und as a preferred language only until a real English track shows up. If the file has properly tagged English audio, und tracks are no longer given the benefit of the doubt:

has_eng_audio = any(t.get("properties", {}).get("language") == "eng" for t in audio)
target_audio_langs = list(audio_langs)
if has_eng_audio and "und" in target_audio_langs:
    target_audio_langs.remove("und")

⚠️

Warning: Lessons learned: Filtering blindly by language is the single most dangerous thing you can do to a media library. A silent movie is a broken movie. Always handle the single-track, no-preferred-match, and und cases before you let any removal logic run.

Safety as a First Principle

Once the language logic was sane, the next set of mistakes were about how I handled the files themselves. Three rules emerged, and all of them are non-negotiable.

Write a new file, verify it, then and only then replace the source. Both scripts remux to a hidden temp file (via tempfile.mkstemp) in the same directory as the original, then call os.replace to swap it atomically onto the original path. Same-directory output is deliberate twice over: a rename within one filesystem is atomic, so Radarr, Sonarr or Jellyfin never catches a half-written file mid-swap, and it sidesteps cross-device link errors when your storage is a MergerFS or ZFS pool. Before any of that, a pre-flight check confirms the directory has at least 105% of the original’s size free, so a full disk can’t produce a truncated output.

Verify means actually verify. A non-empty output file and a happy exit code aren’t proof of a good remux. Before the swap, verify_remux() re-probes the temp file with mkvmerge -J and confirms it still contains a video track and exactly the audio and subtitle track counts that were requested. If the re-probe fails or the numbers don’t add up, the original stays put and the error is logged. This is the guard against a truncated-but-nonzero output silently overwriting a perfectly good source.

Read exit codes correctly. This one trips people up (including me). mkvmerge does not use the standard “0 good, anything else bad” convention. It returns 0 on full success, 1 on success with warnings, and 2 for a genuine error. A warning often means a track had a minor inconsistency that mkvmerge handled fine, so the scripts accept both:

result = subprocess.run(cmd, capture_output=True, text=True)
ok = (result.returncode in (0, 1) and os.path.exists(tmp) and os.path.getsize(tmp) > 0)

Only code 2, or a zero-byte output file, or a failed verification re-probe, means stop and preserve the original intact.

There’s one more thing the swap has to get right that I didn’t anticipate: the cleaned file is a new file, so it arrives owned by whoever ran the script, with fresh permissions and timestamps. On a library shared between Radarr, Sonarr, Jellyfin, and an NFS export, that’s a quiet way to break things days later. After a successful remux, preserve_metadata() copies the original’s ownership, mode, timestamps, extended attributes, and POSIX ACLs onto the cleaned file before the swap. If ownership can’t be restored (you’re not running as root or as a user who doesn’t own the files), it logs a warning and keeps going rather than dying - the cleanup still worked, the file has a new owner.

The subtitle judgment call. Subtitles get a more aggressive policy than audio, and I want to be honest that this is a choice rather than an obvious truth. A missing subtitle doesn’t break playback the way missing audio does - worst case, you re-add a sub later. So: keep your preferred languages, drop the rest, and drop SDH/hearing-impaired tracks along the way. But there’s one exception that’s absolute. Forced subtitles - the ones that translate the single line of Elvish in an otherwise English film - are always kept, no matter what language they’re tagged as:

is_forced = props.get("forced_track") or "forced" in track_name
is_sdh = props.get("flag_hearing_impaired") or "sdh" in track_name

if is_forced:
    keep_subs.append(t["id"])
elif lang in sub_langs and not is_sdh:
    keep_subs.append(t["id"])

Note the fallback to a "forced" substring in the track name. An earlier version trusted the Matroska flag alone, and plenty of real-world releases name the track “Forced” without ever setting the flag. If you rely on SDH subtitles, add them back by removing the is_sdh check or adjusting the config to your needs.

The Bash Era, and Why It Didn’t Survive

The first working version of all this was a pair of Bash scripts gluing mkvmerge -J to jq, and they taught me a lot of lessons that anyone scripting against a real library will hit.

Filenames with spaces, apostrophes, and brackets will quickly break a naive loop. A movie called Amelie (2001) [1080p].mkv breaks for f in $(find ...) instantly, because the shell splits on whitespace. The fix is NUL-safe handling: find ... -print0 piped into IFS= read -r -d ''. Then the progress counter I added always reported zero at the end, because piping find into a while loop runs the loop in a subshell where incremented variables die on exit - the fix is process substitution, done < <(find ...). And to make re-runs skip finished files, the Bash version renamed every cleaned file to movie.cleaned.mkv and excluded that pattern from the next find. The name was the marker.

It all worked. I still retired the whole thing, for three reasons:

The marker rename was the wrong kind of clever. Renaming every file in the library means renaming it out from under Radarr, Sonarr and Jellyfin, which tracks files by path.
The logic outgrew Bash. Once I wanted forced-subtitle detection, junk-track flags, metadata edits, and output verification, the jq one-liners turned into the hardest-to-read part of the system. The same selection logic in Python is named functions you can actually reason about.
Two scripts, one brain. The cron sweep and the bulk-pass script were 90% copy-paste of each other, drifting apart with every fix. The Python rewrite collapsed them into one script, mkvclean.py, that handles both jobs.

The shell lessons still stand - NUL-safe loops and subshell scoping will bite you in any Bash project. But the shipped versions of these tools are Python 3.10+, end to end, with no jq dependency at all.

⚠️

Warning: Lessons learned: Write the quick Bash version to learn the problem, but notice the moment the logic outgrows it. For me that moment was the third nested jq expression. Every safety feature that now guards my library - verification, forced-sub detection, metadata healing - would have been miserable to bolt onto the Bash scripts and was straightforward in Python.

The Checkpoint That Makes the Whole Thing Self-repeating

Here’s something worth calling out: mkvmerge doesn’t skip files it already cleaned. The tool has no memory whatsoever. It will happily re-remux a file you cleaned yesterday, every single run. Your script has to do all the skipping.

mkvclean.py solves this with a checkpoint file instead of the old rename-marker. Every successfully handled file is appended to ~/.mkvclean_checkpoint as a JSON line recording its path, modification time, and size. On the next run, any file whose path, mtime, and size all match its checkpoint entry is skipped instantly - no probe, no remux, nothing. Filenames never change, so Radarr, Sonarr, and Jellyfin are none the wiser.

The mtime-and-size part does something the rename-marker never could: it detects upgrades. When Radarr replaces a movie with a better release, the new file has a new size and mtime, the checkpoint entry no longer matches, and the sweep cleans the new file automatically on its next pass.

The checkpoint is append-only during a run, so it compacts itself at startup - duplicate entries collapse to one line per file and entries for files that no longer exist under the scanned root are pruned. Combined with fcntl file locking (a manual run and a cron run can physically never overlap) and an os.scandir-based traversal that pulls file metadata during the walk instead of stat-ing everything twice, the sweep stays fast and safe on a multi-terabyte array.

The --batch flag is the right way to start cautiously and build confidence before committing to the whole library:

./mkvclean.py /media/Storage/Movies --batch 20    # first run: 20 files only
./mkvclean.py /media/Storage/Movies --batch 50    # next run: 50 more (checkpointed files are skipped)
./mkvclean.py /media/Storage/Movies --batch 0     # 0 = no limit: finish everything remaining

Stop at any point with Ctrl+C - the script traps the signal, logs that it was interrupted, and the checkpoint means re-running always picks up exactly where you left off. It also removes the tmp file it was building.

The SABnzbd hook doesn’t need a checkpoint at all. It runs once on each download’s folder right after the download completes, and it skips the remux entirely when there’s nothing to remove. A DRY_RUN constant at the top (and a --dry-run flag on the sweep) lets you see exactly what would happen before anything touches real files.

While you’re still learning your own language rules, run in small batches and spot-check a handful of files in Jellyfin before continuing.

What Gets Stripped Beyond Languages

Language filtering was the original goal, but once the scripts were reading every track’s properties anyway, a second category of junk became impossible to ignore.

Commentary and descriptive audio. Director’s commentary, “descriptive video service” tracks, and audio descriptions are junk for most libraries, and they’re often tagged eng - so a pure language filter keeps them. The scripts detect them through the explicit Matroska flags first (flag_commentary, flag_visual_impaired), which catch untitled or non-English commentary tracks, then fall back to track-name matching:

is_junk = (
    props.get("flag_commentary")
    or props.get("flag_visual_impaired")
    or any(x in track_name for x in JUNK_AUDIO_NAME_PATTERNS)
)

where the name patterns are commentary, description, director, and dvs.

Default-flag enforcement. This fixes the single most common Jellyfin complaint - the wrong audio track playing by default. The first kept audio track becomes the sole default, and the default flag is explicitly cleared on every other kept track, so a stale flag carried over from the source can’t leave two defaults fighting each other:

for i, tid in enumerate(keep_audio):
    cmd += ["--default-track-flag", f"{tid}:{1 if i == 0 else 0}"]

Subtitle default flags are cleared across the board, so nothing forces subs on by default.

Cosmetic metadata. Release groups leave a trail: the global container title set to the release filename, tag blocks, track names like “Commentary by…”. The scripts strip the global title, wipe tags, and clear junk track names by default (each pass individually toggleable). They’ll also fill in an undefined track language when the track’s name gives it away - a track named “English” tagged und gets relabeled eng, and crucially this inference runs before the language filter so the corrected tag actually affects what’s kept. Attachment removal (cover art, embedded fonts) exists but is off by default, because embedded fonts can matter for styled ASS/SSA subtitles.

The mkvpropedit fast-path. Here’s a neat optimization that fell out of all this: when a file’s tracks are already exactly what you want but its header is still wrong - a stale default flag, a junk title - there’s no reason to remux gigabytes. mkvpropedit (also from MKVToolNix) edits headers in place on the existing file, in milliseconds, with no temp file, no disk-space requirement, and ownership/permissions preserved for free because the file never moves. The scripts plan header-only edits whenever track selection comes back unchanged, so a second pass over a clean library is nearly instant and still fixes metadata.

Intel NUC 12 Pro (NUC12WSHi5). A compact mini PC with a 12th-gen Core i5-1240P and Iris Xe that can drive up to four displays (dual Thunderbolt 4 + dual HDMI), plus 2.5GbE and Wi-Fi 6E. The H-chassis adds a 2.5″ bay alongside NVMe storage and up to 64GB RAM, making it a quiet, versatile homelab node or HTPC/office box.

Amazon Price: Loading...

Availability: Checking...

Amazon Newegg

Contains affiliate links. I may earn a commission at no cost to you.

Moving Cleanup Into the Pipeline With SABnzbd

Sweeping the entire library on a schedule works, but the smarter move is to clean each file once, at download time, before Radarr or Sonarr ever imports it. SABnzbd makes this possible with post-processing scripts. That’s the job of mkv_strip_pp.py: drop it into your SABnzbd scripts directory, make it executable, then assign it to your movies (and/or tv) category in Settings → Post-Processing in the SABnzbd UI so only those downloads trigger cleanup.

SABnzbd exports job details as both positional arguments and environment variables. The hook reads the job folder from SAB_COMPLETE_DIR (falling back to argv[1] for manual testing on the command line) and the download status from SAB_PP_STATUS (falling back to argv[7]). If the download was already marked failed, the hook exits immediately without touching anything.

Because the cleaned file is swapped atomically onto the original path, the filename Radarr or Sonarr expects to import never changes. They never see a seam - it imports the file it expected, already trimmed.

The most important design principle here: cleanup must never block an import. If stripping a file fails, leave the original intact, log it, and still let the import proceed. A working movie that didn’t get cleaned is fine. A broken import because your cleanup script choked is not.

Two Different Exit-Code Systems (Don’t Mix Them)

This trips people up because two separate exit-code conventions show up in the same context, and they mean different things.

SABnzbd’s exit codes tell SABnzbd what to do with the job after your post-processing script runs:

Exit code	SABnzbd interpretation
`0`	Success
`1`	Job failed
`2`	Retry the download

mkvmerge’s exit codes (covered earlier) describe the remux itself: 0 success, 1 success with warnings, 2 real error. Same numbers, completely different meanings. If you setup your SABnzbd hook to mkvmerge’s convention, a remux warning (mkvmerge 1) would mark a perfectly good download as failed in SAB. That’s exactly the kind of silent breakage you don’t want.

The hook keeps these rigorously separated. Any cleanup failure - mkvmerge returning 2, a failed verification, an unexpected exception on one file - is handled as “log it, leave the original, move on.” The hook still exits 0 in all those cases, because a movie that wasn’t cleaned is still importable. The only time it exits 1 is for genuine setup problems (mkvmerge or mkvpropedit not found, job directory missing), because those mean nothing can work at all.

log.info("Found %d MKV file(s).", len(mkvs))
for path in mkvs:
    log.info("Inspecting: %s", os.path.basename(path))
    try:
        result = process_file(path, AUDIO_LANGS, SUB_LANGS, resolved_mkvmerge,
                              resolved_mkvpropedit, DRY_RUN, CLEANUP)
        if result == "nothing":
            log.info("  Nothing to strip, leaving file as-is.")
        elif result == "stripped":
            log.info("  Done.")
        elif result == "fixed":
            log.info("  Metadata fixed in place (no remux).")
    except Exception as e:  # never let one bad file break the import
        log.error("Unexpected error on %s: %s. Original kept.", os.path.basename(path), e)

log.info("Finished. Exiting 0 so import proceeds.")
return 0

Even the logging follows the never-block rule. SABnzbd captures the script’s stdout into its job history, and the file log (defaulting to /config/mkv_strip_pp.log, which lands on the persistent volume in Docker) is best practice - an unwritable log path produces a warning instead of a crash, because an earlier version managed to fail an entire download over a log file it couldn’t open.

Configuration is a block of constants at the top of the script - preferred languages, dry-run, and toggles for each cleanup pass:

AUDIO_LANGS = ["eng", "jpn", "und"]
SUB_LANGS = ["eng", "und"]
LOG_FILE = "/config/mkv_strip_pp.log"
DRY_RUN = False

STRIP_TITLE = True
STRIP_TAGS = True
INFER_LANGUAGE = True
CLEAR_JUNK_TRACK_NAMES = True
STRIP_ATTACHMENTS = False

Installing mkvmerge Inside a Docker Container

If you run SABnzbd in Docker, you hit an error the first time you try to use mkvmerge inside it: the binary isn’t there. The tempting fix is docker exec into the running container and installing it by hand. Don’t do this. The moment you pull a new image, your install vanishes.

The correct fix is a tiny custom Dockerfile that extends the base image, and the repo ships one:

FROM lscr.io/linuxserver/sabnzbd:latest

# Add mkvmerge (from MKVToolNix) and the optional ACL so the post-processing script can run.
# The LinuxServer.io image is Alpine-based, so use apk.
RUN apk add --no-cache mkvtoolnix acl

That apk line hides my actual debugging moment. My first Dockerfile used apt-get, and the build died with apt-get: not found. The base image was Alpine, not Debian. Alpine uses apk, not apt.

In docker-compose.yaml, point the service at the Dockerfile with build: instead of a plain image:, and mount the hook into the container’s scripts directory:

services:
  sabnzbd:
    build: .                 # builds the Dockerfile in this repo
    container_name: sabnzbd
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Etc/UTC
    ports:
      - "8080:8080"
    volumes:
      - /path/to/appdata/sabnzbd:/config
      - ./mkv_strip_pp.py:/config/scripts/mkv_strip_pp.py
      - /path/to/downloads:/downloads
    restart: unless-stopped

One habit changes after adding packages to a custom Dockerfile. Updating the container is now docker compose up -d --build --pull always rather than a plain docker compose pull, so your added packages get rebuilt on top of the fresh base.

The mkvclean.py runs directly on the host, so the host needs Python 3.10+ and MKVToolNix there, plus the optional acl package if you want POSIX ACLs preserved on cleaned files:

apt-get install -y mkvtoolnix python3 acl

MINISFORUM MS-A2. A compact mini-workstation built around up to a 16-core Ryzen 9 9955HX, with dual 10GbE SFP+ plus dual 2.5GbE, flexible storage (U.2 + M.2 including 22110), and triple 8K display outputs. A strong homelab node or small server with serious I/O.

Amazon Price: Loading...

Availability: Checking...

Amazon

Contains affiliate links. I may earn a commission at no cost to you.

The Cron Sweep, or Why You Still Need One

With per-download cleanup in place, you might think the sweep is redundant. It’s not. The SABnzbd hook only sees files SABnzbd downloads. It misses manual rips, files copied in from elsewhere, media you reorganized, and anything from a category the hook isn’t attached to. A scheduled sweep is your catch-all for eventual consistency across the whole library.

The sweep is mkvclean.py again - the same checkpoint-aware script from the bulk pass, run unattended. Install it somewhere on the PATH:

sudo cp mkvclean.py /usr/local/bin/
sudo chmod +x /usr/local/bin/mkvclean.py

Locking is built in: the script takes a kernel-level fcntl lock on /tmp/mkvclean.lock at startup and exits quietly if another run already holds it, so a cron firing while a manual run is still chewing through the library can never cause two sweeps to step on each other. Logging is built in too - everything goes to the path given by --log (default ~/mkvclean.log) as well as stdout - so the crontab line stays clean with no redirection:

0 4 * * * /usr/local/bin/mkvclean.py /media/Storage/Movies --batch 0 --log ~/mkvclean-cron.log

Thanks to the checkpoint, a typical nightly run skips almost everything instantly and only touches whatever’s new or upgraded since the last pass. Safe to run as often as you like.

The One-Time Bulk Pass

The third job is the one you run once: cleaning the back catalogue that existed before any of this automation. Same script, run by hand, starting with a dry run.

Always do the dry run first on a new library. It reports exactly what would be stripped from every file without modifying a single byte:

./mkvclean.py /media/Storage/Movies --dry-run

Read the log. Confirm it’s keeping what you expect - especially on anime and foreign films. Then start small, with a batch spanning different languages and sources:

./mkvclean.py /media/Storage/Movies --batch 20

Spot-check a handful of files in Jellyfin to confirm audio and subtitle selection looks right. If everything looks good, continue in larger batches:

./mkvclean.py /media/Storage/Movies --batch 100
./mkvclean.py /media/Storage/Movies --batch 0    # 0 = no limit: finish everything remaining

The language preferences are flags rather than edits to the script - --audio eng,jpn,und --subs eng,und are the defaults. If you catch a bad rule, stop with Ctrl+C, adjust the flags, and re-run; checkpointed files won’t be touched again. There’s also a --prefer-audio-channels flag for the hoarder special: releases carrying both a lossless 7.1 track and an AC3 5.1 track in the same language. Set it and the sweep keeps only the best English track (exact channel-count match wins, otherwise most channels) instead of all of them.

Files a bad run already processed may need re-downloading; that’s exactly why you dry-run first and verify in batches rather than letting it run unattended overnight.

MINISFORUM MS-01 Mini Workstation. The MS-01 i5 is a tiny mini PC with plenty of cores, multiple NVMe slots, and real homelab networking (dual 10G SFP+ plus 2.5 GbE), which makes it perfect for a Proxmox compute node. It has more than enough power for Jellyfin, the *arr stack, downloads, and a few VMs or LXCs, without turning your closet into a jet engine or space heater.

Amazon Price: Loading...

Availability: Checking...

Amazon Newegg

Contains affiliate links. I may earn a commission at no cost to you.

How Jellyfin Sees the Result

A quick note on the playback side, because it’s where you confirm the whole thing worked. Jellyfin uses ffmpeg for transcoding. That behavior is unaffected by whether the file was muxed with ffmpeg or mkvmerge, as long as the MKV is spec-compliant. So cleaning with mkvmerge causes Jellyfin no trouble.

The two issues Jellyfin users hit most are the wrong default audio track and missing subtitles. The default-track enforcement described earlier handles the first one directly: every cleaned file comes out with exactly one default audio track (the first kept one) and no default-flagged subtitles, so Jellyfin’s track picker starts from a sane baseline instead of whatever flags the release group left behind. And when a file needs only that flag fixed, the mkvpropedit fast-path patches the header in place without remuxing at all, which is instant.

Missing subtitles are guarded by the forced-track rule - the subtitle that translates the alien dialogue in an otherwise-English film survives every cleanup, even when it’s mistagged.

The Finished System

What started as a one-off cleanup grew into two scripts covering three jobs, now public as mkv-track-stripper on GitHub under the MIT license:

Per-download cleanup via the SABnzbd hook (mkv_strip_pp.py, runs at download time, keeps the original filename so Radarr imports cleanly).
Scheduled catch-all sweep via cron (mkvclean.py --batch 0, locked with fcntl, checkpoint-aware, catches everything the download path misses).
One-time bulk pass over the back catalogue (the same mkvclean.py, dry-run first, batched, resumable, verified in Jellyfin).

The repo carries the Dockerfile for the SABnzbd image, a changelog, and a README covering setup and troubleshooting - including the one warning everyone asks about, Could not preserve ownership ... Operation not permitted, which means the script isn’t root and couldn’t chown the cleaned file back to its original owner. The cleanup itself succeeded.

In practice the logs show three patterns: a file with nothing to strip and a clean header is checkpointed in milliseconds, a file needing only a flag or title fix gets an in-place mkvpropedit edit, and a bloated release with 30 subtitle tracks gets trimmed down in about 30 seconds, losslessly.

Frequently Asked Questions

➤ Will this re-encode my video and lose quality?

No. mkvmerge is a remuxer, not a transcoder. It copies the video and chosen audio and subtitle streams into a new container untouched, so there’s zero quality loss and the operation finishes in seconds rather than the hours a re-encode would take.

➤ Will it ever leave a movie silent?

Not if you build the safety logic in. The rules that prevent it are simple: never strip the only audio track, and when no audio matches your preferred languages, keep everything and log a notice instead of dropping anything. Undetermined (und) tracks only get dropped when a properly tagged English track exists alongside them. The output is also re-probed before the original is replaced, so a broken remux can never overwrite a good file.

➤ Will it delete forced subtitles?

No. Forced subtitles are always kept regardless of language, detected through the Matroska forced_track flag or a “forced” marker in the track name. Those are the subs that translate the one line of Elvish in an otherwise-English film, and losing them genuinely breaks the movie.

➤ Does the sweep re-process files it already cleaned?

No. mkvclean.py records every handled file’s path, size, and modification time in a checkpoint file and skips matching files instantly on later runs. If Radarr upgrades a movie, the new size and mtime no longer match, so the replacement gets cleaned automatically on the next sweep.

➤ Does this work with Sonarr too, not just Radarr?

Yes. You bind the SABnzbd post-processing script to a tv category in the SABnzbd UI the same way you bind it to movies. The library sweep doesn’t know or care about Radarr or Sonarr at all - it walks the filesystem.

➤ Why does apk add mkvtoolnix fail in my Docker image?

apk is Alpine’s package manager. If your base image is Debian or Ubuntu, you need apt-get install mkvtoolnix instead. The reverse trips people up too: apt-get: not found means your base is Alpine. Always check the base image distro before picking a package manager.

➤ How do I list MKV tracks and their languages from the command line?

Run mkvmerge -i file.mkv for a quick human-readable list of track IDs, types, and languages. For scripting, use mkvmerge -J file.mkv to get machine-readable JSON you can parse safely, reading tracks[].properties.language for each track.

Wrapping Up

A few themes carried through every layer of this build, and they’re worth restating: verify before you replace, make every operation idempotent, let failures fail safe, and choose your tools for where the project is heading rather than where it started. The single most important lesson, the one that cost me a movie night, is that you reason from the tracks a file actually contains, never from what you assume it should contain.

A good automation is rarely the first script you write. It’s the system you arrive at after the task has taught you what it actually needs. Mine started as a ten-line “keep only English” filter, went through a Bash-and-jq era, and ended as two Python scripts with checkpointing, verification, and safety defaults baked in - now maintained in the open at github.com/KryptikWurm/mkv-track-stripper.

If you want to adopt this, you don’t have to rebuild it: clone the repo, run mkvclean.py --dry-run against a test folder, read the log, then layer in the SABnzbd hook and the cron sweep once you trust the selection rules. Keep the verify-before-replace defaults in place, and you can roll this out across thousands of files without ever losing a movie night.