Sovereign AI engineering log

Running AI at home.
No cloud. No compromises.

A self-hosted AI stack built and documented in public. Every article comes from a real system, a real error, or a real fix. Nothing is speculative.

For people running, scoping, or considering self-hosted LLMs on DGX Spark, Strix Halo, multi-3090, or similar hardware.

host
DGX Spark
cloud
none
posts
176

Latest Articles

All articles
A self-hosted stack still hits two or three tasks where a frontier model wins. Buying that access from Anthropic means a KYC account and a card. ppq.ai is the other door: an OpenAI-compatible proxy to Claude, GPT and others, paid per query over Bitcoin Lightning, no account. Here is what it is good for, where it betrays the sovereign premise, and exactly how I wired it as the fallback behind local Qwen.
strategylightning

Frontier AI on Bitcoin: ppq.ai as the No-KYC Cloud Fallback for a Sovereign Stack (2026)

A self-hosted stack still hits two or three tasks where a frontier model wins. Buying that access from Anthropic means a KYC account and a card. ppq.ai is the other door: an OpenAI-compatible proxy to Claude, GPT and others, paid per query over Bitcoin Lightning, no account. Here is what it is good for, where it betrays the sovereign premise, and exactly how I wired it as the fallback behind local Qwen.

Read article
GLM-4.7-Flash is a 30B-A3B MoE coding model that fits a single 128GB DGX Spark with room to spare. Bringing it up on Blackwell sm_121 took two failures that every published recipe gets wrong: the 'AWQ' build is actually compressed-tensors, and the model speaks MLA, so flash_attn is illegal. Here is the working recipe, the single-stream decode number nobody reports, and what it does to my coding agent.
strategydgx-spark

GLM-4.7-Flash on a Single DGX Spark: the Repo Says AWQ, the Model Says MLA

GLM-4.7-Flash is a 30B-A3B MoE coding model that fits a single 128GB DGX Spark with room to spare. Bringing it up on Blackwell sm_121 took two failures that every published recipe gets wrong: the 'AWQ' build is actually compressed-tensors, and the model speaks MLA, so flash_attn is illegal. Here is the working recipe, the single-stream decode number nobody reports, and what it does to my coding agent.

Read article