pouët.net

Math+asm: LODA, an assembly language for OEIS integer sequences

category: code [glöplog]
 
Idea: train an AI using OEIS as training data. Does this AI output meaningful programs?

https://loda-lang.org/

Yes, so far LODA has discovered 35k programs!

The entire OEIS database holds 350k integer sequences, so there are still a long way to go.

LODA shuffles the instructions and tries out random programs, until there is a match with a sequence in OEIS. If there already exist a LODA program, then it picks the fastest version.

It's possible to manually write LODA programs.

The LODA executable runs on macOS and Ubuntu. Doesn't run on Windows (yet).

LODA can be compiled to run on Raspberry Pi.

Some LODA programs:
https://loda-lang.org/list0/

What do you think about this?

Disclaimer: I have contributed a little to this project.
added on the 2021-09-22 22:25:59 by neoneye neoneye
Looks interesting! Is this related to Linear Genetic Programming in some way?
Dresdenboy I'm a little bit shocked you know about this.
added on the 2021-09-23 10:04:03 by rac rac
The c++ implementation, not sure how it's made.

I have experimented generating programs using the LGP approach, but most of the generated programs correspond to other programs already in the repository. I tried combining LGP with bigram/trigrams when picking the next instruction, that didn't make much of a difference. I think my approach is terrible. My code is here, hopefully I can improve it.
added on the 2021-09-23 17:14:03 by neoneye neoneye
In fact I'm so surprised because many years ago I was kindly asked to review and classify (in the sense of: determining which mathematical categories it belongs to) the monography "Linear Genetic Programming" by Brameier, & Banzhaf. I found the approach a bit surprising; I read the book from cover to cover (to my best abilities) and did a review. Weird that years later, the topic pops up again on Pouet.
added on the 2021-09-23 19:45:54 by rac rac
Similarly there were Picbreeder (fork) which was awesome to play with.
added on the 2021-09-23 20:30:41 by neoneye neoneye
Hi rac, yep, I do! ;)

I'm following several topics in EA related fields. LGP and CGP have their advantages in doing performant evaluation on lots of data.

At work I sometimes used GA. For hobby projects I used LGP.

Yeah, Brameier and Banzhaf are the important names here. :) Banzhaf also did something about doing LGP on GPUs.
@Dresdenboy: What kind of hobby projects do you do?
added on the 2021-09-26 13:44:03 by neoneye neoneye
neoneye: really cool project!

Just out of curiosity: so obviously they are validating the results of generated programs only against finite subsequences (OeisSequence::DEFAULT_SEQ_LENGTH = 100 ?). I wonder how many false positives you encounter during the search e.g. programs that are able to generate first N terms in the sequence but fail for N+1 (e.g. for N = 99 or N = 100). This is intriguing question to me actually, as I've heard there is theorem that from all programs that can generate N terms in any given sequence the shortest/fastest one is the one that is most likely (statistically) to correctly predict N+1 term. So intuitively, it makes sense to keep only the shortest/fastest versions of the program not just for optimization but also for correctness. Of course this would be only statistical correctness in a set of all possible sequences, I bet there are some sequences where you need larger and larger program when N increases, but those should be extremely rare.
added on the 2021-09-28 22:46:24 by tomkh tomkh
The 100 term length, is only for programs that use indirect memory access. This is for programs such as the Ackermann function. There are around 60 programs that uses it.

For normal programs that doesn't use indirect memory access, then it compares with the entire OEIS B-file. I think I have seen b-files with more than 16000 terms.

Rule30 has 1000 terms in its b-file. There is no loda program for Rule30, yet. (damn, the only demo'ish name I could think of as an example, and there is no loda program, sorry).

Sometimes lots and lots of terms needs to be checked.
added on the 2021-09-29 17:05:44 by neoneye neoneye
False positive rate in my own experiments are horrible. I use a bloom filter with 40 terms of the sequences (300k) without a corresponding program. Whenever there is a program that satisfy the bloom filter, then it's saved as a candidate program. Incredibly few of these candidate programs can compute the rest of the sequence.

I don't know stats about the c++ implementation.
added on the 2021-09-29 17:16:12 by neoneye neoneye
My program for A006577:

Code: ; A006577 by tomkh add $0,1 ; n = $0, start from n = 1 ; Find approx. upper-bound for sequence length until it reaches 1. ; Obviously nobody knows what's the actual upper-bound or if there is one ; as Collatz was not proven ;) so it's most likely wrong, but should check ; for the first 100000 terms. ; That's the best I can do for non-Turing complete language, where infinite loops ; are not supported. mov $1,$0 seq $1,196 add $1,270 ; $1 = 270 + floor(sqrt(n)) mov $4, 0 ; sequence length counter until "n" reaches 1 lpb $1 mov $2, 2 sub $2, $0 ; $2 = 2-n lpb $2 ; executed if 2-n > 0 <=> n <= 1 mov $1, 0 ; stop main loop mov $2, 0 lpe mov $2, $0 mod $2, 2 ; $2 = (n%2) mov $3, 1 sub $3, $2 ; $3 = 1-(n%2) lpb $2 ; executed if (n%2) == 1 mul $0, 3 add $0, 1 ; n = n*3 + 1 sub $2, 1 lpe lpb $3 ; executed if (n%2) == 0 div $0, 2 ; n = n/2 sub $3, 1 lpe add $4, 1 sub $1, 1 lpe mov $0, $4


It should check for the first n=100000 terms, but nobody can guarantee it (as for today) it will work for all "n". So it's a case when we cannot even know if it's false-positive easily :P

Anyway, I'm don't know how to contribute to the LODA database. Feel free to add it. It was also my first program with using those weird lpb..lpe instructions, so I'm sure it can be optimized/shortened.
added on the 2021-09-30 00:36:07 by tomkh tomkh
Well commented, easy readable. The first 10k terms in A006577 b-file in checks out.

Thank you for such a nice contribution.

Optimizations. Your code looks awesome. I doubt it's possible optimize it further.
Registers are always zero when the program starts, so no need for the `mov $4,0` line.
added on the 2021-09-30 17:22:50 by neoneye neoneye
Haha cool, thanks!
added on the 2021-09-30 20:13:43 by tomkh tomkh
@neoneye:
I worked on decision logic for stock trading.

A more recent idea was to have some code generator for size coding projects. Not for the full intro, but more for shorter code which is meant to accomplish specific tasks (with known results).
For size coding projects, how could that work? like making an emulator with a subset of the x86 instructionset?
added on the 2021-10-02 10:39:15 by neoneye neoneye
Yep, something like that. As usual it needs fast ways of evaluation. This could even happen by creating actual machine code. IIRC this already has been done with LGP, chaining code macro blocks (of the same size, so with some NOPs sometimes for padding).

Alternatively an emulator with just a subset of the instructions, or simply without more complex stuff like protected mode, 16b/32b/64b mode, interrupts etc. could nicely run in parallel. Sizecoding often uses code bytes as data, or sometimes even executes data as code, or jumps into the bytes of one instruction to interpret it as another. It's not that simple a problem. ;)

But even finding the shortest way of computing specific sequences is already helpful, e.g. to create some desired data patterns.
Using the raw machine code block, as input for the program. That's meta. awesome.

Do you have limits on the number of times it can loop before discarding the mutation? Iirc I have max 1000 iterations for computing a term, otherwise the program gets discards.
added on the 2021-10-03 09:14:28 by neoneye neoneye
@tomkh:
Thanks for your LODA program. It is in the "loda-programs" repo and was already used in a couple of other programs. I also added it as a template to the miner and it immediately matched a couple of similar sequences.

The new version of the tool has a new command "loda submit" which allows you to easily integrate your programs into the programs repository.
added on the 2021-10-09 12:58:34 by ckrause ckrause

login