What is the research goal of LODA miner

Questions and Answers : LODA Language : What is the research goal of LODA miner
Message board moderation

To post messages, you must log in.

AuthorMessage
fwjmath

Send message
Joined: 14 May 22
Posts: 2
Credit: 0
RAC: 0
Message 72 - Posted: 14 May 2022, 8:33:25 UTC

As a researcher who use OEIS whenever meeting an integer sequence, I am quite interested in this project, but there are also some doubt. LODA seems to be a versatile language for computing integer sequences. It seems to encode primitive recursive functions, which is largely sufficient for all computations, but mind that Knuth's up arrow notation is also primitive recursive, so even though LODA programs will eventually halt, it may halt after a long time. Some of the sequences on OEIS are also out of reach for LODA, for instance A079365 of the Chaitin constant, but that is less important.

But that's not my point. As a researcher, I am much more interested in the interpretation of these programs. In mathematics, not only do we want a program that computes a sequence, we also want to understand it, ususally by giving a proof. That is the only way that we can be sure that the program really computes the sequence. However, the LODA language seems to be quite difficult to interprete into meaningful mathematical content. Do you have concrete ideas on how to exploit the mined programs?

fwjmath.
ID: 72 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Christian Krause
Project administrator

Send message
Joined: 9 May 22
Posts: 251
Credit: 450,553
RAC: 194
Message 82 - Posted: 14 May 2022, 10:14:55 UTC - in response to Message 72.  

The goal is to find new and faster programs / algorithms and conjectures for integer sequences. Sometimes they can be translated into a mathemetical notation using sum and products or recursions. But we don't want to be too restricted. Yes, it can take some effort to understand what the programs are actually doing. But often this also gives some insights into the nature of the sequence.

PS: In OEIS, the defining criteria for a sequence are the terms. For us (in LODA), it's different. We identify (infinite) seqences by a program that computes it.
ID: 82 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Simon Strandgaard
Project administrator

Send message
Joined: 8 May 22
Posts: 5
Credit: 4,969
RAC: 0
Message 94 - Posted: 14 May 2022, 12:32:57 UTC
Last modified: 14 May 2022, 12:34:12 UTC

Disclaimer: I know nothing about proving math and little about AI.

Only the OEIS terms data are being used as training data for LODA. The OEIS name and formulas are currently being ignored.

If there are few terms (less than 40) registered in OEIS, then the LODA program will output those terms, but may likely diverge after the last term.

If there are many terms (more than 10000) registered in OEIS and the LODA program outputs these terms, then the chance of correctness is higher.

A program satisfies all the many terms in OEIS and compute things in a novel way. Proving correctness of a such program, that is the challenge.

Currently it's being done manually.
https://oeis.org/search?q=loda&language=english&go=Search

Can this be automated?

Approach A. Transformer architecture, that takes LODA programs+terms as input. Given a prompt of 20 terms and see what program is being generated. Is it correct/incorrect and decide on what to do about it. Are there patterns that can be proven.

Approach B. Huge AI trained with OEIS terms + OEIS metadata + documents being referenced. Lots of math training data.
ID: 94 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
fwjmath

Send message
Joined: 14 May 22
Posts: 2
Credit: 0
RAC: 0
Message 143 - Posted: 15 May 2022, 14:43:47 UTC

I think that, at least for me, one of the key is how to know if the way a mined LODA program computes a sequence is new in some sense. It should be different from know formulas, it should be different from the definition, and it should have a proof. All these three parts are difficult and requires human input for the moment. So the best we may have for the moment is that people interested in a given sequence may take time (or not!) to look at such programs. This is not likely to happen for sequences with "nice" tag though, as many of them may already have nice formulas. I think that the best bet may be those with "hard" and/or "more" tags, and preferably without "less" tag and/or with "nice" tag. But not many such sequences have enough term for LODA program mining. Some tests may be needed.

fwjmath.
ID: 143 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Christian Krause
Project administrator

Send message
Joined: 9 May 22
Posts: 251
Credit: 450,553
RAC: 194
Message 150 - Posted: 15 May 2022, 18:35:47 UTC - in response to Message 143.  

Sequences with too few terms are in fact an issue. We had the idea to use jOEIS to compute more terms. This was also successfully done by the author of Sequence Machine (https://sequencedb.net/). Yes, you're right: manual validation is a major effort/task. But the good thing is: we have can automated way of determining whether one program is "better" than another. This is mainly decided based on complexity (one program needs less computation steps to compute the same number of terms). In addition: all programs are stored in a repository (including its history). So at any point in time, if someone is interested in a particular sequence or program, it is there for analysis. The manual reviewing process that is done at the OEIS requires major human effort and does not really scale. We take a different approach and try to automated as muss as possible. That being said, we also did manual coding and validation of programs.
ID: 150 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
ML1

Send message
Joined: 19 Jun 22
Posts: 6
Credit: 57,484
RAC: 2
Message 410 - Posted: 19 Jun 2022, 15:08:21 UTC - in response to Message 94.  
Last modified: 19 Jun 2022, 15:10:18 UTC

... If there are many terms (more than 10000) registered in OEIS and the LODA program outputs these terms, then the chance of correctness is higher...

I would expect not... (Unless you have some proof?)

All that a coincidence of two sequences that are 10000 integers long shows, is that they are coincident for that part of their sequence. The very next number can be different.

Nothing can be said about what that next number is unless you already know what that number is, or there is some consistent pattern that is being followed that continues to hold true for both sequences to remain coincident.

... Just like the games played trying to follow stock markets. The next valuation can have no bearing of what went on before...


Now... If there are some stats available for what is actually seen?...

Keep searchin',
Martin
ID: 410 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Christian Krause
Project administrator

Send message
Joined: 9 May 22
Posts: 251
Credit: 450,553
RAC: 194
Message 453 - Posted: 29 Jun 2022, 18:06:50 UTC - in response to Message 410.  

I wrote "the chance of correctness is higher", which is true. Of course, in the end it requires a proof. But this is how maths works: come up with a hypothesis, collect enough evidence that supports it, then come up with a proof. In LODA, the hypotheses are the programs. The evidence is the number of correctly terms computed. The proof requires manual inspection of the program.
ID: 453 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote

Questions and Answers : LODA Language : What is the research goal of LODA miner

©2024 LODA Language