pouët.net

News from pouet-mirror.sesse.net

category: general [glöplog]
 
The previous thread used for this was really about something else, so I'm starting a new one to share the occasional tidbit. Hopefully only good news. :-)

First, the JSON now has a “mirror” field for each URL; it lists, if known and relevant, a more well-known place (scene.org, amigascne.org, my static ftp.untergrund.net dump, etc.) where you can find the same file. The idea is that if it's lost on upstream, you don't have to pick out the file from my mirror, but can just link directly to the file. If there are multiple candidates, it has some heuristics to pick a “preferred” one (e.g., scene.org parties is preferred over compilations, scene.org is preferred above most other sites, etc.).

This is a content match, not a filename match; it can detect repacks (looks at hash of the bytes inside the archive, not the archive itself), filters out scene.org's ad files (both old and new format), and looks recursively inside archives. It supports most common and a few uncommon formats used across platforms. (If you're now firing up your unpacker exploits, don't bother; the unpacking runs in a sandbox :-) ) So if you're missing a .dms, don't be surprised if it tells you that the only available copy is an .adf inside a .zip inside an .iso inside a multi-part .rar.

There is now a browser extension (works with anything supporting MV3, e.g. Chrome or Firefox) that consumes status.json; it shows the archive status and a mirror link (if existing) for each indexed URL on Cardboard and Pouët. It's really unpolished, so I haven't bothered publishing it, but contact me if you want a copy. :-)

As an offshoot from this, I made a makeshift search engine. It simply does case-insensitive substring searches in all the filenames I have available (1M+ archives, 20M+ files after unpacking). It's already recovered a fair amount of prods that Pouët has been missing for a decade or more; perhaps good searchers can squeeze out yet more. Searches hit disk (rotating rust) and there's the occasional bug, but they should be pretty fast. If you want to do something more heavy-duty, the database it uses is downloadable (/files.db, in plocate format).

However… all of these can only find things where my machine can actually see the content. That means that I must either have crawled it, or I must have a mirror. I have a pretty good collection of those after a while (especially since I mirror scene.org's mirror section, too), but there are definitely things I'm missing; e.g., there are 200k+ CSDB releases that I cannot easily get my hands on, and post-untergrund.net, I don't think I will be getting Amigamega or Fujiology updates either. So if you want your stuff searchable; please open up rsync (or whatever) and let me pull a mirror from you. :-)

Sorry, that got long. Go rescue a demo about it.
added on the 2026-05-01 08:48:37 by Sesse Sesse
Oh, and another thing I forgot: I now crawl git archives. It's just a single one-off dump like everything else (so not a continuous mirror), but if you have a GitHub (non-single-file) or git:// URL, I will send git to download the repository and make a bundle file (which you can download and then clone from) instead of just the HTML front page. Will of course update with support for Codeberg etc. when/if Pouët allows that.
added on the 2026-05-01 09:05:05 by Sesse Sesse

login