AWB Investment Research Strategy

"The Painting" — A progressive, layered approach to finding best-idea investments with downside protection. Last updated: 2026-03-11.

The Goal
Current State — What We Have
Storage Strategy — The Single Source of Truth
The Layers — Progressive Deepening
The Funnel — From 8,049 to ~10
Prioritization Logic
Execution Pipeline — What to Do Next
Cost Budget
Phase 3: The Watchlist
Operating Principles

1The Goal

We are looking for:

Companies where we can buy at a price that gives us (a) very little chance of losing money AND (b) enough upside to be one of our best available ideas.

This means looking in two main buckets:

AGI beneficiaries — companies or assets that become more valuable because AGI is coming: compute, power, physical bottlenecks, scarce assets, and businesses AGI structurally enables.
Punished and too-cheap opportunities — companies hit for AGI or non-AGI reasons where the market may be over-punishing them relative to the real downside.

Every candidate still needs downside protection: tangible assets, guaranteed cash flows, liquidation value, buybacks, or a floor we understand with high confidence.

The output is a small watchlist (5-15 companies) with specific entry prices. We monitor prices and act when they hit our targets.

2Current State — What We Have

Data Assets

Asset	Status	Location	Coverage
Company Universe	DONE	awb/data/universe/universe.db	8,049 companies (CIK, ticker, name, exchange)
10-K Extracts	DONE	awb/data/exports/10k_all/	3,648 companies (10-K text extracts, up to 20K chars each)
AGI Impact Scores	DONE	agi_scores/results/{TICKER}.json agi_scores/agi_scores_all.json	3,648 companies scored (5 dimensions + holistic score 1-10 + category + reasoning)
Returns Backtest	DONE	agi_scores/returns_data.json	59 of 68 score 9-10 companies (1yr, 2yr, 3yr returns vs SPY)
Leopold Portfolio	DONE	/tmp/salp_13f/	5 quarterly 13F filings parsed. Full position analysis with P&L.
Investor Tracker	DONE	outputs/investor-tracker-*.html	5 investors profiled (Li Lu, Weschler, Combs, Pabrai, Leopold)
META Deep Dive	DONE	awb/data/exports/META_analysis/	Full 6-agent analysis + 3 price targets. Template for future deep dives.
Floor Price Pilot	DONE	outputs/floor-price-analysis-10-companies.html	10 random companies. Proof of concept only.
AWB SQLite DB	EMPTY	awb/data/awb.db	Schema exists (companies, financials, facts, analyses) but only 7 companies, 0 financials. Not being used.
Structured Financials	NOT STARTED	—	No structured balance sheet / income statement data at scale

Problem: Data is Scattered

Results are spread across JSON files, SQLite databases (2 of them, both underused), HTML reports, markdown files, and /tmp. There is no single place where "everything we know about Company X" lives. This means:

We risk re-doing work we already did
We can't easily query "show me all companies with AGI score 8+ AND P/B < 1.5"
Context gets lost between Claude Code sessions

3Storage Strategy — The Single Source of Truth

Decision: Use `awb.db` as the master database

Every piece of analysis we produce gets stored in the SQLite database at awb/data/awb.db. One database. One source of truth. Every session reads from and writes to this database.

Database Schema

The existing schema is close to what we need but requires these changes:

-- 1. COMPANIES TABLE (exists, needs population)
--    Import all 8,049 companies from universe.db
--    Add columns for quick filtering

-- 2. NEW: company_metrics — the "painting" table
--    Each row = one metric for one company
--    This is where ALL layers of analysis accumulate
CREATE TABLE company_metrics (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    ticker TEXT NOT NULL,
    cik TEXT,
    metric_name TEXT NOT NULL,     -- e.g. 'agi_score', 'tangible_book_value', 'floor_price'
    metric_value REAL,             -- numeric value (NULL if text-only)
    metric_text TEXT,              -- text value (for reasoning, category, etc.)
    metric_json TEXT,              -- JSON for complex data (timelines, breakdowns)
    source TEXT,                   -- 'agi_scoring_v1', 'yfinance', 'sec_xbrl', 'deep_dive_v1'
    confidence TEXT,               -- 'high', 'medium', 'low'
    computed_at TIMESTAMP,
    cost_usd REAL,                 -- what it cost to compute this
    UNIQUE(ticker, metric_name, source)  -- no duplicates
);

-- 3. NEW: watchlist — the final output
CREATE TABLE watchlist (
    ticker TEXT PRIMARY KEY,
    entry_price REAL,              -- our target buy price
    current_price REAL,
    floor_price REAL,
    bull_case_5yr REAL,            -- 5-year bull case price
    agi_score INTEGER,
    conviction TEXT,               -- 'high', 'medium', 'low'
    thesis TEXT,                   -- 1-2 sentence investment thesis
    last_updated TIMESTAMP
);

What Gets Stored Where

Data Type	Storage	Why
All metrics (scores, financials, valuations)	`company_metrics` table	Queryable, deduplicated, versionable
Deep-dive analyses (long text)	`analyses` table (existing)	Already designed for this
Key facts/signals	`facts` table (existing)	Structured findings from analyses
Final watchlist	`watchlist` table	The output — what we actually monitor
Raw data files (10-K text, XML)	Flat files on disk	Too large for SQLite, used as inputs
HTML reports	`outputs/` directory	Human-readable outputs, synced to Google Drive

Deduplication Rule

Before computing any metric, check if it already exists:

SELECT * FROM company_metrics
WHERE ticker = ? AND metric_name = ?
AND computed_at > datetime('now', '-30 days')

If a result exists and is less than 30 days old, skip. If older than 30 days, recompute (prices change, filings update). Each metric has a source field so we can track which version of the analysis produced it.

Migration Plan (One-Time)

Import 8,049 companies from universe.db into awb.db companies table
Import 3,648 AGI scores from agi_scores_all.json into company_metrics
Import Leopold portfolio data into company_metrics (as a "smart_money_signal" metric)
Import the 10-company floor price pilot results

4The Layers — Progressive Deepening

Each layer adds more detail to our picture of each company. Earlier layers are cheap and broad; later layers are expensive and narrow. We paint the most important parts first.

#	Layer	Coverage	Cost	Metrics Produced	Status
0	Universe Who exists?	8,049	Free	ticker, name, CIK, exchange, SIC code	DONE
1	AGI Score Does AGI help or hurt?	3,648	$355	agi_score (1-10), 5 dimension scores, category, confidence, reasoning	DONE
2	Financial Snapshot What do the numbers say?	~350 (score 7+)	~$0 yfinance API	market_cap, revenue, net_income, total_assets, total_liabilities, total_debt, cash, book_value, tangible_book_value, fcf, pe_ratio, pb_ratio, ev_ebitda, dividend_yield, shares_outstanding	NEXT
3	Asset-Based Floor What's the downside?	~350	~$35 Haiku batch	floor_price, floor_confidence, floor_methodology, tangible_book_per_share, net_current_asset_value, liquidation_value_estimate	NEXT
4	Smart Money Signal Who else owns this?	~350	~$0 SEC EDGAR 13F	leopold_holds, leopold_shares, leopold_avg_cost, top_13f_holders, insider_buying_ratio, institutional_ownership_pct	NEXT
5	Quick Valuation What's cheap vs expensive?	~100 (passed filters)	~$10 Haiku	price_to_tangible_book, ev_to_fcf, price_vs_floor, margin_of_safety_pct, cheapness_rank	LATER
6	5-Year Cash Flow Model What's the upside?	~30	~$30 Sonnet	bull_revenue_2031, bull_fcf_2031, bull_price_2031, base_price_2031, bear_price_2031, attractive_entry_price, current_price_vs_entry_range	LATER
7	Deep Dive Full 6-agent analysis	~10	~$50 Opus	Full META-style analysis: business moat, management, risks, 3 price targets, AGI impact deep-dive	LATER
8	Watchlist Monitor and act	5-15	~$0	entry_price, current_price, alert_threshold, thesis, conviction	LATER

Total estimated cost to reach the watchlist: ~$125

Layer 0-1 already done ($355 spent). Layers 2-8 cost approximately $75-125 more. The funnel narrows aggressively, so expensive analysis is only done on a handful of companies.

5The Funnel — From 8,049 to ~10

8,049 companies — Full universe

▼

3,648 — Have 10-K filings (scored for AGI impact)

▼ Filter: AGI score ≥ 7

~350 — AGI beneficiaries (Layers 2-4 applied here)

▼ Filter: trades near/below floor OR P/TB < 2 OR smart money signal

~50-100 — Cheap + AGI tailwind (Layer 5 applied here)

▼ Filter: margin of safety ≥ 30% AND compelling expected value

~20-30 — Candidates for cash flow modeling (Layer 6)

▼ Filter: current price near attractive range and thesis still intact

5-15 companies — WATCHLIST (Deep Dive + Monitor)

6Prioritization Logic

Within each layer, we prioritize companies using a composite score:

Priority Score = weighted combination of:

Signal	Weight	Why
AGI Score (1-10)	30%	Higher score = more AGI tailwind = more upside potential
Proximity to Floor (current price / floor price)	25%	Closer to floor = more downside protection = safer entry
Smart Money Signal (held by tracked investors)	20%	Leopold, Li Lu, etc. holding = validation of thesis
Asset Density (tangible book / market cap)	15%	Higher = more asset backing per dollar of market cap
Simplicity (can we understand the business?)	10%	Simpler businesses = more reliable floor estimates

Special Priority Boost

Leopold holds + AGI score 8+ — automatic top-50 priority. Smart money + our framework agree = high signal.
Leopold holds + underwater — automatic top-30 priority. These are his newest, most contrarian bets. If the thesis is right, these are the cheapest entry points.
Tracked investor overlap — if 2+ tracked investors hold the same stock, priority boost.

7Execution Pipeline — What to Do Next

Step 0: Database Migration DO FIRST

One-time setup. Consolidate all existing data into awb.db.

Create company_metrics and watchlist tables
Import 8,049 companies from universe.db
Import 3,648 AGI scores from JSON into company_metrics
Import Leopold holdings as smart money signals

Cost: $0 Time: ~30 minutes

Step 1: Financial Snapshot (Layer 2) DO SECOND

Pull structured financial data for all ~350 score-7+ companies via yfinance.

Market cap, revenue, net income, total assets, total liabilities, cash, debt
Tangible book value (total equity minus goodwill minus intangibles)
FCF, P/E, P/B, EV/EBITDA
Store all in company_metrics

Cost: $0 Time: ~15 minutes (yfinance API calls)

Step 2: Asset-Based Floor Screening (Layer 3) DO THIRD

For each of the ~350 companies, compute a rough floor price. Two approaches in parallel:

Quantitative floor (automated): tangible book value per share, net current asset value (NCAV), cash per share. Flag anything trading below 2x tangible book.
AI-assisted floor (Haiku batch): feed financial snapshot + AGI score + 10-K excerpt to Haiku. Ask: "What is the approximate floor price for this company? Consider real asset value, contracted revenue, owned subsidiaries, cash, and debt. Output a number and confidence level."

Cross-reference both approaches. Where they disagree, flag for manual review.

Cost: ~$35 Time: ~1 hour

Step 3: Smart Money Overlay (Layer 4)

For the ~350 companies, check if any tracked investors hold them.

Leopold's current 25 positions (already have this data)
Pull 13F filings for Li Lu, Weschler, Combs, Pabrai (4 more investors)
Flag any overlaps (multiple tracked investors holding same stock)
Compute insider buy/sell ratio from Form 4 data

Cost: $0 Time: ~30 minutes (SEC EDGAR API)

Step 4: First Cut — Filter to ~50-100 candidates

Apply the priority score. Keep companies that meet ANY of:

Trading within 50% of floor price (cheap on assets)
Held by 2+ tracked investors
AGI score 9+ AND P/TB < 3
Leopold holds AND currently underwater

Cost: $0 Time: 5 minutes (SQL query)

Step 5: 5-Year Cash Flow Models (Layer 6)

For the ~20-30 highest priority candidates, build bull/base/bear scenarios.

Model revenue growth under "AGI buildout plays out" scenario
Project FCF margins 5 years out
Apply terminal multiple (15-25x FCF)
Compute: "At what price is the reward/risk compelling enough to invest?"
Use Sonnet for this — good enough quality, reasonable cost

Cost: ~$30 Time: ~2-3 hours

Step 6: Deep Dives (Layer 7)

Full META-style 6-agent analysis for the final ~10 companies.

Business & competitive analysis
Financial deep dive (every balance sheet line item)
Management & governance assessment
Risk & regulatory analysis
AGI impact deep dive
Valuation with 3 price targets

Cost: ~$50 Time: ~4-6 hours

Step 7: Build Watchlist (Layer 8)

Compile the final watchlist with entry prices. Set up monitoring.

One-page summary per company: thesis, entry price, floor price, bull case, risks
Weekly price check script (automated)
Alert when any watchlist stock drops within 10% of entry price

Cost: $0 Time: 1 hour

8Cost Budget

Phase	What	Model	Companies	Est. Cost
Already spent	AGI scoring (Layer 1)	Sonnet	3,648	$355
Step 0	DB migration	—	—	$0
Step 1	Financial snapshot	yfinance	~350	$0
Step 2	Floor screening	Haiku batch	~350	$35
Step 3	Smart money overlay	SEC API	~350	$0
Step 4	Filter / prioritize	SQL query	—	$0
Step 5	5-year cash flow models	Sonnet	~30	$30
Step 6	Deep dives	Opus	~10	$50
Step 7	Watchlist setup	—	5-15	$0
Total remaining				~$115
Grand total (including Layer 1)				~$470

9Phase 3: The Watchlist

What the watchlist looks like

Ticker	Entry Price	Current	Floor	Bull 5yr	Upside	Conviction	Thesis
EXAMPLE	$12.50	$18.40	$10.00	$125	Compelling	High	AGI infrastructure with $X/share in tangible assets...
Populated after Steps 5-7 complete

Monitoring Cadence

Daily: Automated price check. Alert if within 10% of entry price.
Weekly: Quick scan of watchlist. Any material news? Any 8-K filings?
Quarterly: Re-run financial snapshot after earnings. Update floor price and 5-year model.
On trigger: When a stock hits entry price, re-read the deep dive. Confirm thesis still holds. If yes, initiate position.

Position Sizing

When we do buy, position sizing follows the floor price confidence:

Very High confidence floor: Up to 10% of portfolio
High confidence: Up to 7%
Moderate confidence: Up to 4%
Low confidence: Up to 2%

10Operating Principles

Never Lose Work

Every analysis result goes into awb.db. Every metric has a source tag, timestamp, and cost. If we redo an analysis with a better model, the old result stays (different source tag). We can always compare versions.

Never Redo Work

Before computing anything, check if it exists in the database. If it's less than 30 days old, skip. If it's older, we can choose to recompute or keep. The UNIQUE(ticker, metric_name, source) constraint prevents accidental duplicates.

Cheap Before Expensive

Free data (yfinance, SEC EDGAR) before paid data (API calls). Haiku ($0.01/company) before Sonnet ($0.10/company) before Opus ($5/company). Only escalate model quality when the cheaper model's output is the bottleneck.

Broad Before Deep

Score 3,648 companies at low depth before deep-diving 10 companies. The funnel ensures we don't waste expensive analysis on companies that fail basic screens. Each layer narrows the funnel by 3-5x.

Quantitative Before Qualitative

Pull numbers first (financials, ratios, prices). Then apply judgment (floor estimates, thesis evaluation). Numbers are fast to compute and easy to filter. Judgment is expensive and should only be applied to the survivors.

Trust but Verify

AI-generated floor prices are estimates. Before putting real money in, we manually verify the deep dive: read the 10-K ourselves, check the balance sheet line by line, validate the thesis. The AI does the screening; we do the final diligence.

What About RL Fine-Tuning?

Considered and rejected for now. The core problem: RL needs a reward signal, and for investments, the reward is whether the investment makes money — which we only know years later. Any proxy reward would just encode our existing beliefs, gaining nothing over good prompting.

The real bottleneck is not model capability (Claude is already excellent at reading 10-Ks and reasoning about businesses). The bottleneck is having the right data in the right structure and applying disciplined screening. That's what this strategy addresses.

If we wanted to revisit later, the most promising angle would be fine-tuning on historical 10-K → 5-year-return pairs (supervised learning, not RL). But we'd need ~20 years of 10-K data mapped to outcomes, and the resulting model would still be backward-looking. Better to invest the time in fundamental analysis.

Immediate Next Session: Start with Steps 0-2

The next Claude Code session should:

Step 0: Create tables, migrate data into awb.db (30 min)
Step 1: Pull financial snapshots for 350 companies via yfinance (15 min)
Step 2: Compute quantitative floors (tangible book, NCAV) for 350 companies (15 min)
Generate: HTML report showing the intersection of "AGI score 7+ AND cheap on assets"

By end of that session, we'll have the first filtered list of candidates. Total cost: ~$0 (no API calls needed for Steps 0-2 quantitative part).