Looped-Transformer-24M

A compact 24M-parameter language model built with a Looped Transformer architecture, trained using the Muon optimizer, and enhanced with Chain-of-Thought reasoning. This model specializes in story generation and basic math reasoning, making it ideal for lightweight experimentation, educational projects, and rapid prototyping.

Model Overview

Parameters
24 Million
Architecture
Looped Transformer
Optimizer
Muon
Reasoning
Chain-of-Thought (CoT)
Training Data
Story-focused corpus
Math Skills
Simple arithmetic
Hardware
Dual NVIDIA T4 GPUs
Training Time
~30 minutes
Language
English
License
AGPL-3.0

What This Model Does Well

  • Story generation — coherent, imaginative, character-driven narratives
  • Dialogue writing — natural conversational flow
  • Basic math — simple arithmetic and step-by-step reasoning
  • CoT reasoning — improved logical flow when prompted
  • Lightweight inference — runs smoothly on consumer GPUs and many CPUs

Training Details

The model was trained for 30 minutes on two NVIDIA T4 GPUs, using a curated dataset of short stories, narrative prompts, character interactions, and basic math word problems.

The Muon optimizer provided fast, stable convergence, making it exceptionally well-suited for small-parameter models.

Intended Use

Designed For

  • Creative writing
  • Story generation
  • Dialogue simulation
  • Educational demos
  • Lightweight reasoning tasks

Not Recommended For

  • Factual retrieval
  • Complex mathematics
  • Safety-critical applications

Example Usage

Get started instantly with the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("your-username/looped-transformer-24m")
model = AutoModelForCausalLM.from_pretrained("your-username/looped-transformer-24m")

prompt = "Write a short story about a robot learning to dream."

inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))