Looking for help crunching a MacOS ARM 64kb bin
category: code [glöplog]
Hey all! Long time demoscene admirer, first time poster.
TLDR; noob looking for advice to make a 69kb macos arm binary smaller
I literally only have this triangle. Its just triangle.
I've been developing a hobby microengine and I currently have a Vulkan 1.4 Triangle weighing in at 51kb when signed locally, and 69kb when codesigned with my Apple development cert. This triangle is being rendered with basic spir-v compiled via slang. I'm super proud of this triangle but until I can get audio and sizing issues resolved, I'm hesitant to move on to more complex scenes.
I have a build here, not that I'm saying you should run unknown foreign code. In this archive is the same bin with symbols and locally signed as well as stripped and signed against my apple dev profile. https://drive.google.com/file/d/1rroIxNGpprSzwGUYZPFMynHSJgySRHtA/view?usp=sharing
My biggest problem is I'm losing 16kb due to dynamic linking requirements and having the first __DATA __got section bloat out to 16kb. The next biggest is codesigning data is appended to the program not within it, so being unable to cut down on this section or the second __DATA const section is doubly painful. I'm hoping someone has a magical golden bullet dyld replacement that will let me not need this section altogether.
I've been working on my own jank disassembler and analyzer so I can visualize my program. I don't know if this helps any but this is its current memory layout, badly labeled

I've tried all sorts of modifications and resigning, all types of changing the program header and program section offsets, and I'm unable to produce a binary that isn't aligned to these 16kb chunks and is this wasteful.
I know I can reduce code if I implement Metal instead of Vulkan, but my challenge to myself is to keep vulkan so I can target Windows/Linux/Mac easier, but reducing code in this binary won't reduce the final binary size in its current state due to these 16kb chunks.
Am at my wits end with this, so I feel like I've gone as far as I can go by myself. I'd greatly appreciate any insight and help on this matter.
TLDR; noob looking for advice to make a 69kb macos arm binary smaller
I literally only have this triangle. Its just triangle.

I've been developing a hobby microengine and I currently have a Vulkan 1.4 Triangle weighing in at 51kb when signed locally, and 69kb when codesigned with my Apple development cert. This triangle is being rendered with basic spir-v compiled via slang. I'm super proud of this triangle but until I can get audio and sizing issues resolved, I'm hesitant to move on to more complex scenes.
I have a build here, not that I'm saying you should run unknown foreign code. In this archive is the same bin with symbols and locally signed as well as stripped and signed against my apple dev profile. https://drive.google.com/file/d/1rroIxNGpprSzwGUYZPFMynHSJgySRHtA/view?usp=sharing
My biggest problem is I'm losing 16kb due to dynamic linking requirements and having the first __DATA __got section bloat out to 16kb. The next biggest is codesigning data is appended to the program not within it, so being unable to cut down on this section or the second __DATA const section is doubly painful. I'm hoping someone has a magical golden bullet dyld replacement that will let me not need this section altogether.
I've been working on my own jank disassembler and analyzer so I can visualize my program. I don't know if this helps any but this is its current memory layout, badly labeled

I've tried all sorts of modifications and resigning, all types of changing the program header and program section offsets, and I'm unable to produce a binary that isn't aligned to these 16kb chunks and is this wasteful.
I know I can reduce code if I implement Metal instead of Vulkan, but my challenge to myself is to keep vulkan so I can target Windows/Linux/Mac easier, but reducing code in this binary won't reduce the final binary size in its current state due to these 16kb chunks.
Am at my wits end with this, so I feel like I've gone as far as I can go by myself. I'd greatly appreciate any insight and help on this matter.
since nobody else has answered here:
I don't think there's a ready-to-use packer/compressor for MacOS 4k intros these days. There's https://github.com/temisu/BR4096 but it's probably bitrotted now (plus, only x86 support). What you could do is using a gzip shell dropper, or maybe port https://github.com/temisu/oneKpaq to Aarch64. However, this will only do basic compression, without trying to fudge the MACH-O file format to achieve much smaller filesizes. For that, you'll probably have to do your own research, I'm afraid.
I don't think there's a ready-to-use packer/compressor for MacOS 4k intros these days. There's https://github.com/temisu/BR4096 but it's probably bitrotted now (plus, only x86 support). What you could do is using a gzip shell dropper, or maybe port https://github.com/temisu/oneKpaq to Aarch64. However, this will only do basic compression, without trying to fudge the MACH-O file format to achieve much smaller filesizes. For that, you'll probably have to do your own research, I'm afraid.
now with actual working link: https://github.com/temisu/oneKpaq
i gave https://upx.github.io a try (running on windows):
upx triangle-unsigned --force-macos
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2025
UPX 5.0.2 Markus Oberhumer, Laszlo Molnar & John Reiser Jul 20th 2025
File size Ratio Format Name
-------------------- ------ ----------- -----------
51272 -> 32784 63.94% macho/arm64 triangle-unsigned
upx triangle-unsigned --force-macos
Ultimate Packer for eXecutables
Copyright (C) 1996 - 2025
UPX 5.0.2 Markus Oberhumer, Laszlo Molnar & John Reiser Jul 20th 2025
File size Ratio Format Name
-------------------- ------ ----------- -----------
51272 -> 32784 63.94% macho/arm64 triangle-unsigned
So I've tried the above attempts, don't think I can port to arm64, seem to be stuck here.
I went through an absurd path and I removed my need of libSystem and reimplemented my own libc stubs.
This immediately triggers a runtime check on MacOS 26, they added a sanity check that all C executables must link against libSystem.
Removing this is the only way I can remove the __got 16kb page in the Mach-O format it seems.
I'm now trying to get something to run targeting MacOS 15 on MacOS26.... my libc replacement is definitely not finished and doesn't run... iunno
I went through an absurd path and I removed my need of libSystem and reimplemented my own libc stubs.
This immediately triggers a runtime check on MacOS 26, they added a sanity check that all C executables must link against libSystem.
Removing this is the only way I can remove the __got 16kb page in the Mach-O format it seems.
I'm now trying to get something to run targeting MacOS 15 on MacOS26.... my libc replacement is definitely not finished and doesn't run... iunno
Er, didn't make it clear in the last post, upx'd arm64 binaries refuse to run completely on my machine and I'm unsure why
It is absolutely cursed but I have a static macos arm64 executable codesigned with my developer id that is correctly rendering the triangle.
It took over a month and a half during spare time... but the bin now has enough macos syscalls implemented that it can run my jank version of dlopen which has to parse the macos shared dyld cache to then open libdyld which gets me real dlopen which then allows me to open libSystem and that gets me what I need.
Unfortunately the code for this alone is the same size as my last code, so this causes me to occupy a third 16kb page instead of two... but after successfully removing the __got section which removes a completely unused 16kb page... this is a huge win as at least this means currently 8kb more free space for code in a sub 96k bin than before.
The quest continues... need to be able to open libdyld wih less code. After that, and as along as Apple servers still allow notorizing of macos 15 bins... this should result in a sub 64kb bin codesigned and notorized by apple.... hopefully.
I had to modify the zig compiler so that it is willing to produce static bins for macos arm64 and not link libc. This was an ordeal
It took over a month and a half during spare time... but the bin now has enough macos syscalls implemented that it can run my jank version of dlopen which has to parse the macos shared dyld cache to then open libdyld which gets me real dlopen which then allows me to open libSystem and that gets me what I need.
Unfortunately the code for this alone is the same size as my last code, so this causes me to occupy a third 16kb page instead of two... but after successfully removing the __got section which removes a completely unused 16kb page... this is a huge win as at least this means currently 8kb more free space for code in a sub 96k bin than before.
The quest continues... need to be able to open libdyld wih less code. After that, and as along as Apple servers still allow notorizing of macos 15 bins... this should result in a sub 64kb bin codesigned and notorized by apple.... hopefully.
I had to modify the zig compiler so that it is willing to produce static bins for macos arm64 and not link libc. This was an ordeal
I'm now in the testing phase... trying to get this bin tested on a variety of mac arm chips because im doing some extremely cursed stuff with the dyld cache.
If anyones interested... afaik the only sub 64kb macos arm64 bin that is codesigned and notarized that I know of.
This should "just load" and not prompt any security prompts.
The bin, zipped:
https://drive.google.com/file/d/1eKm7b0YniE9AiRli6i6b6ik8nkkmFj9y/view?usp=sharing

If anyones interested... afaik the only sub 64kb macos arm64 bin that is codesigned and notarized that I know of.
This should "just load" and not prompt any security prompts.
The bin, zipped:
https://drive.google.com/file/d/1eKm7b0YniE9AiRli6i6b6ik8nkkmFj9y/view?usp=sharing

regarding upx, goto https://github.com/upx/upx/issues
i see some macos issues, but it looks like these are from the pre-arm64 area...
i see some macos issues, but it looks like these are from the pre-arm64 area...
Quote:
I'm unable to produce a binary that isn't aligned to these 16kb chunks and is this wasteful.
Operating systems typically load executables by memory mapping, and they need to be alligned to the virtual memory page size for that to work. It's less of a problem on PC, where the page size is 4 kB instead of 16 kB.
