Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
AbstractPhil 
posted an update Sep 29
Post
278
I've hit the ground running on the geometric lattice vocab system. Everything I've built will be housed in the repo.
https://github.com/AbstractEyes/lattice_vocabulary/tree/dev
Including all of David's model structure.
Through the development cycle I'll be integrating everything, little AI help can actually be offered in general - since AI tends to hallucinate and decimate large structures.
I will be using AI assistance for formula expansion and integration, which means they will be imperfect until every single one is given a fine toothed comb.
The deployment will be as rapid as I can, and the output will yield results at every step with small main tests on individual scripts and files.

EVERYTHING was built almost independent of each other, so integration is going to have a configuration hierarchy that needs to be smoothed out - but it will be smoothed out.

I believe I've picked a good foundational shape for the expansive program scripts; which will enable robust iteration and progression similar to how I design game engine elements and systemic accessors.
This will be mostly hand coded for the integration process, so it won't be as quick as if I could just dump GPT pro on it - but GPT pro can't handle anywhere near this many lines of code so it's on me.

After integration I can run the agentic forms of AI over it and introduce tons of bugs for me to fix. That will be fun. After that it should work as a proper caching vocabulary, formula synthesizer, tensor creator, multi-device trainer, and a few other elements.

I simply lack the expertise to hit machines like pyring today, but that will change as I learn more. I'm building the system specifically with growth and progress in mind, so it will be iterated and fixed rapidly. The structure is intentionally built to be rapidly iterated and altered within reasonable constraints.

The engineering elements are specifically built to be less deep and more overridable in many areas specifically for experimental purposes.

I chose this route because I can have David in here almost immediately vs trying to make David standalone functional and getting massive headaches trying to run him over and over watching crash after crash because my old system was heavily AI generated instead of hierarchically created in a reasonably debug capable format.

geovocab2 houses the changes. The largest one being an INSTANT vocabulary load time vs the old one taking minutes to prepare the vocabulary. The LAZY loading with pyarrow support is far more powerful than any of the earlier iterations and I advise switching to the concept if you haven't yet.

AI ritualistically defaults to iterative, even though pyarrow with columnar is considerably faster.

The trie structure was established preparing the ngram structural trainer, which will be included directly into the lookup as an optional sorter comparator. The load time is nearly instant and the lookup time rapid. There are better formats for smaller processes, but this is meant to house hundreds of thousands or even hundreds of millions of ngrams, not just a few hundred. This structure operates really well on tpu; which is how I'll be training the upcoming vocabulary 5pair geometric feature structures - which will contain highly advanced and enriched learned structures between 2d and 9d shapes instead of JUST 5d shapes.

The rapid synthesis in the new system and the robust response from the test formulas show that these are highly enriched. The structural awareness of these crystals are more intelligent and robust than before by a large margin and the theta rotation only helps them rather than hurts them.

The next geometry will be trained entirely in fp64; established from numpy random crystals. The primary anchor of each is specifically oriented based on lexical frequency within the dataset and given a full shaped object based entirely on the lexical order.

Each ngram tree layer of traversal is meant to be given the parent's anchor and theta rotation applied - allowing the internal structure of that lexical order to not only be applied as a semantic and symbolic state, but also retain lexical complexity. This is a large step forward in cohesion.

Everything will be fully transparent. I'll hide nothing moving forward or reserve it, it'll be either Apache or MIT.

In this post