AI-generated Tracker Music

category: music [glöplog]
I'm sure many of us have seen AI-generated live music on the internet. I thought, "how about AI-generated tracker music?" What envision is a neural network, with OpenMPT-assisted technologies, that scans each sample and/or instrument properties (what sound a sample makes, etc.)

I also envision a model where we give the NN a set of samples and instruments for it to work with, and over the course of time, it yields an original song that uses said sample/instrument set.

May!e reduce the problem a little, restrict it to classic 4 channel mods. There should be plenty of data available for training :)

I'd also restrict it to chiptuney stuff (short samples, more emphasis on melodies and tracker techniques) to further reduce the complexity and broadness in a first attempt.

From what I know, this is not something you can try out quickly. The people I know who work in this field need to buy computation time, and then training will take a while. Too expensive for just messing around.

Noone stops you from giving this a shot though. I wouldn't be surprised if everything is available (if you aren't afraid of python and weird toolchains). I'd love to see the results.
added on the 2021-08-17 10:10:21 by jco jco
I'd love to see the results.

Just feed it Wrecklamation fifty times! \o/
added on the 2021-08-17 10:11:36 by Bombe Bombe
I think it could be feasible. You might restrict the genre and type of instruments. If you try a ML solution, perhaps you could train it with WOTW or Estrayk modules. They made a lot of chiptunes with a similar set of instruments (but never exactly the same samples) and a (relatively) limited set of compositional resources and scales.

I am talking about 4ch MODs. The usual arrangement of bass, percussion, melody and arpeggios.

You can use the instruments (samples) of a hundred MODs (classified as basses, leads, etc) and train four AIs, one per channel, to generate (for instance) a bass who complement a set of three channels (lead, arpeggios, percussion). And then, you start randomly with one channel in a couple of patterns, do the complementary channels, set some rules of transition among sections. The system will need some memory of sections and previous patterns, basically to generate the general structure of the music.

It could be a nice experiment. Will Pseudo-Estrayk or Pseudo-WOTW surpass the real Estrayk and/or WOTW?

A Pseudo-Virgill would be nice too, I always like his chord-progressions. However, it could be difficult to achieve in 4-channel MODs. Perhaps it's doable with a generator of chords in the style of aKlang. Maybe the real Virgill could code it! :D
added on the 2021-08-17 10:44:01 by ham ham
Interesting project (idea)! I was thinking about something similar but never got around to investing time into the whole preliminary work .. But I would love to hear your results!
added on the 2021-08-17 22:31:35 by BSC BSC
Nice idea, but don't forget to train it with the sample data aswell, since only note and effect data will make it confused.
added on the 2021-08-18 20:35:53 by d vibe d vibe
I've noticed a pattern in MIDI/note generator thingies, they don't take into account the actual sound that comes out. One of the main things in playing any instrument is, you listen to what it sounds like, and you adjust your playing accordingly. If you play a note, make it a nice note.
added on the 2021-08-18 20:51:35 by yzi yzi
Yup. The difference with midi files is that if it follows the general midi standard, the sound patches sounds roughly the same (or tries to sound the same). Mods are a totally different story.
added on the 2021-08-18 22:08:14 by d vibe d vibe
(meaning that if a neural network trains on GM files it can more easily categorize what type of patch (instrument) is used)
added on the 2021-08-18 22:10:19 by d vibe d vibe
After a lot of computation the result might be more mods inspired by Space Debris.
added on the 2021-08-18 22:27:23 by noname noname
jco made a couple of good poins. But I think it would be feasible to start training the network on your local machine (given it is no Atari ST :-D). I got some pretty quick results using Keras (Python) a while ago on my Thinkpad. One interesting approach would be to use LSTM networks, there were some quite impressive C code generator experiments on the interwebs. LSTMs kind of learn what comes after what. Would probably yield rather funny sounding tracks, but that would be the path I'd chose if I were you. Some links:


added on the 2021-08-18 23:19:04 by BSC BSC
I actually wanted to do this for a course project a few years ago. It ended up being outside of the scope of the project especially considering I couldn't find any library that could write MOD files. So, what I did instead was use an LSTM to measure similarity of patterns in 4CH MODs. Write-up is here: https://byte.observer/tunesimilarity.pdf
Here's the code as well. Maybe something in here is useful to someone https://byte.observer/tunesimilarity.zip
While you could in theory train an AI with notes and effects, just as if it was a midi, there are some aspects that will making the effort harder if not impossible. Examples?
- samples containing loops
- samples with chords
- samples with different tuning
- samples with sounds at different filter levels
- sweeping samples played with sample offset
- pattern jumps and loops
While you can indeed feed a list of pattern notes/commands into an AI, incorporating the aspects above would take the challenge to a whole different level.
added on the 2021-08-19 08:05:48 by dixan dixan
Exactly my point, dixan
added on the 2021-08-19 09:09:44 by d vibe d vibe
Even if it's chip music, it can't differ if it's for example a drum sample, a lead instrument or a bass instrument by just looking at the note data.
added on the 2021-08-19 09:16:29 by d vibe d vibe
BSC: I have a strong suspicion that modeling the task has to incorporate more than sequences, in line with what dixan wrote.

Two things have already been done (a lot):
- training ai with pure sample sequences (e.g. that nonstop death metal generator is hilarious). fascinating, because it's the most radical approach to do this. the examples for speech (results in asmr mumbling, very uncanny) and piano i remember too.
- training ai with notes. typically midi. yields interesting results too, works great with piano sheet music, bach and so forth.

with tracker music, both aspects need to be combined somehow. the ai has to be somehow aware of what the samples sound like, how they're being used, as well as the note-sequence stuff within the patterns. the use and abuse of tracker effects makes it more complicated.

I would like to investigate the whole topic myself at some point, so thanks for all recommendations for tools to look into. I'd love to do it with c++, but if there is no way around python, so be it.
added on the 2021-08-19 10:40:10 by jco jco
Dixans examples are spot on. You'd have to additionally classify each sample by another (I'll call it that for now) layer, which was able to tell what kind or class of instrument is contained, because even if you looked at each channel individually, there could and would still be more than one instrument / sample per channel. Otoh I also think that the model might be able to learn e.g. the patterns in which bassdrum, hi-hat and snare often share one channel. Maybe assigning the actual samples to the final track would need to be done manually. And the samples-problem mentioned by dixan could maybe be taken care of by not using tunes that make heavy use of some problematic kinds of samples or effects (loops, chords, tuning, offset) in the first place. I must admit I was thinking chiptunes when I wrote my first reply. But in general, starting really simple and going from there is probably helpful to stay sane.

JCO: If you like, we could collaborate on this if you start investigating. I have a bit of hands-on experience using the Python-based tools.
added on the 2021-08-19 14:08:21 by BSC BSC
May!e reduce the problem a little, restrict it to classic 4 channel mods. There should be plenty of data available for training :)

I agree with jco. Start with something small, then move onto something else larger overtime. For example, after Soundtracker/Noisetracker/Protracker, move onto Scream Tracker 3 or Fasttracker 2, and vice versa. Right?
In historical order. ST-01 tunes first.
added on the 2022-09-20 10:43:15 by yzi yzi
I second that!
Don't forget we need a market for prompts, too.
added on the 2022-09-21 06:58:19 by novel novel
That too.
In historical order. ST-01 tunes first.

What do you mean? Modules with ST-01 samples or giving this NN the entire ST-01 sample set?
OpenAI's Whisper for converting human speech to text. Maybe it can be modified for converting audio into individual instruments/nodes.
added on the 2022-09-22 02:03:00 by neoneye neoneye