Soft Axiom is a full-stack AI knowledge management platform with RAG (retrieval-augmented generation) capabilities. Ingest documents, query them through a chat interface backed by language models, and manage your personal knowledge base. The Next.js frontend is deployed on Vercel and the FastAPI backend runs on AWS App Runner. Live at softaxiom.com.
Motivation
There is no shortage of ChatGPT wrappers. The barrier to shipping an AI-powered product has dropped to the point where most of them are thin skins over an API call—prompt in, completion out, deploy to Vercel, done. The result is a flood of nearly identical apps that outsource every hard problem to a single provider and call it a day.
I wanted to go in the opposite direction: build a full-stack AI application end to end, from training and fine-tuning models with PyTorch and LoRA, to implementing the retrieval and embedding pipeline, to standing up the backend (FastAPI, PostgreSQL with pgvector), the frontend (Next.js), and the infrastructure (Docker, AWS App Runner) for actual cloud deployment. Not because wrapping an API is wrong, but because I wanted to understand every layer of the stack myself—how embeddings are generated and stored, how hybrid search and reranking work under the hood, how a model serving pipeline fits together, and what it actually takes to go from a training script to a running service. Soft Axiom is the vehicle for that.
Engineering and Product Lessons
Soft Axiom is also an exercise in using the best available engineering tools across the stack, including modern cloud computing. The project combines application development, infrastructure, model integration, and deployment workflows instead of treating them as separate concerns.
The biggest lesson has been that the quality of outcomes depends on the quality of the question. AI is most useful when the right issue is identified first, then broken down into testable hypotheses and fixed with fast feedback loops.
It also forced explicit tradeoff analysis: understanding how Vercel and AWS differ, comparing price-to-performance by workload, deciding what belongs in an MVP, and choosing when a feature is ready for public release. Security and privacy are treated as core product constraints, not post-launch add-ons.
Document Ingestion and RAG
The ingestion pipeline loads documents from multiple formats, chunks them for retrieval, and stores the chunks with embeddings in PostgreSQL with pgvector. At query time, Soft Axiom retrieves relevant chunks, optionally re-ranks them, and sends only the current query plus the retrieved context to the configured LLM provider.
LLM Providers
Soft Axiom supports multiple LLM providers: Ollama for fully local inference, plus OpenAI, Anthropic, and Google for cloud-backed queries.
Deployment Stack
The frontend is a Next.js 14 app deployed on Vercel. It communicates with the backend through API route rewrites that proxy requests to the backend origin, keeping the browser pointed at a single domain. Authentication is handled by AWS Cognito using the OIDC authorization-code flow via oidc-client-ts and react-oidc-context on the client side.
The backend is a FastAPI application running on AWS App Runner with the Python 3.11 runtime. App Runner performs source-based deployments from the main branch: the build phase installs dependencies from a pinned requirements file, downloads the AWS RDS SSL certificate bundle, and the run phase starts Uvicorn on port 8000. JWT tokens from Cognito are verified on every authenticated request using JWKS.
Data lives in a PostgreSQL instance on AWS RDS with the pgvector extension enabled. Document chunks and their 1536- dimensional embeddings (generated by OpenAI text-embedding-3-small) are stored together, and retrieval uses a hybrid of HNSW vector similarity search and full-text search with tsvector indexes. Connections go through asyncpg with a pool of 2–10 connections and SSL enforced via the RDS CA bundle.
The project is closed source.
What's Next
- Finalize MVP boundaries and public launch milestones
- Expand security and privacy controls (data retention, access policies, and auditability)
- Continue Vercel/AWS price-performance profiling as usage scales