How to Prompt Effectively — The Mortgage LLM

1. Be Specific

The more specific you make your question, the more likely you are to get a relevant answer.

Use commonly used abbreviations and mortgage terminology (LTV, DTI, EPD, GSE, R&W, MSR, CPR, LLPA).
Include the loan-level features or aggregate filters you care about (vintage, source, FICO band, channel, agency).

Consider the following examples:

Performance of 2020 loans

For Freddie Mac 2020 fixed-rate originations, what is the cumulative 36-month
60+ DQ rate, broken out by origination FICO band (740+, 700-739, 660-699, <660)?

The second prompt includes the source, product type, vintage, target metric, and segmentation. It will yield a far more actionable answer than the vague version, which forces the model to pick defaults for every missing dimension.

2. Specify the Output Format

Specify the structure and format of the answer you want. There is a difference between asking for a “table,” a “chart,” and a “brief explanation.”

For Freddie Mac 2020 fixed-rate originations, what is the cumulative 36-month
60+ DQ rate by FICO band? Return as a table sorted by FICO band descending.

Show me the top 10 conforming-loan originators in 2024 as a chart of total
loan volume in dollars.

The platform currently renders results across four tabs (Summary, Table, Graph, SQL). You can ask for a specific emphasis: “return just the SQL so I can review the query before running,” “summarize in two sentences,” or “rank the top 10 only.”

3. Clarify the Desired Outcome

Be clear about the outcome you want. Are you ranking lenders, comparing vintages, attributing losses, identifying outliers, or sanity-checking a portfolio? Stating the use case helps the model focus on the relevant metric and bring the right caveats.

Which sellers should I add to my repurchase-risk watch list based on 2024
EPD trends? Rank by 12-month EPD rate among sellers with 10,000+ loans
in the data.

What is the most recent 60+ DQ rate for 2022 GSE originations, and how does
that compare to 2021 originations at the same seasoning?

Characterize the 2022 origination vintage: cohort size, average FICO/LTV/DTI
at origination, and 24-month cumulative EPD vs. the 2020-2021 average. Note
any vintage-specific anomalies.

4. Set the Scope

Asking for different details will yield a different answer and determine how broad or narrow the scope is.

What are the most common channel mixes for FHA originations?

What is the most effective channel mix for FHA originations — ranked by
12-month EPD rate — among 2024 cohorts?

What is the evidence for a Fannie-Freddie repurchase-rate differential
across vintages?

What is the most recent vintage-matched evidence for a Fannie-Freddie
repurchase-rate differential, using 2025 reporting-year data?

The platform’s data currently spans three sources with different scope — mention the one you want where it matters. We’re actively adding new datasets and the lists below will continue to grow:

HMDA — all loan types, application + origination only, 2013-2025.
Fannie + Freddie (gse_loans) — conventional GSE-purchased only, with delinquency / prepayment / repurchase history, 2013-2026.
GNMA — FHA / VA / USDA-RD / PIH, monthly snapshots from 2017-12 onward.

5. Experiment

If you wonder whether The Mortgage LLM can do something, the best way to find out is to ask. Don’t be afraid to try different prompts and explore new functionalities. Every response shows the SQL the platform ran and the raw rows it returned — if a phrasing didn’t produce what you expected, the SQL tab will tell you exactly which interpretation the model picked, and you can rephrase from there.

New features and additional data sources are landing regularly — a question that didn’t have a good answer last month may have one now.

6. Iterating within a conversation

The Mortgage LLM keeps conversation context. After the platform answers a question, your next prompt can build on it directly — the model remembers the prior cohort scope, source, and metric, so you can refine without restating everything. Each turn renders into a visual transcript that stacks above the prompt input, so you can scroll up and re-read what you asked and what the platform returned without losing your place.

Use follow-ups to drill in:

User:  Top 10 originators of conforming purchase loans in 2024
Model: [returns Rocket #1, UWM #2, ...]

User:  Now break that out by state — which states is Rocket strongest in?
Model: [keeps the 2024 conforming-purchase scope, adds state breakdown]

User:  How does Rocket’s 2024 picture compare to 2020?
Model: [holds the lender + state structure, swaps the time scope]

Context cap. The model is fed the last 3 turns of (question + summary) as context, so very long threads will silently drop the oldest turn after the 4th follow-up. If you’re building on something deep in the thread and the model starts losing track, click New Chat in the left sidebar and restate the relevant baseline in one prompt.

Switching topics. Start a fresh thread (click New Chat in the left sidebar) when you’re moving to an unrelated subject. Carrying stale context across unrelated questions can subtly bias the model toward the prior cohort or source.

Reopening a past chat. Signed-in users can click Search Chats in the sidebar to browse every saved conversation (grouped Today / Yesterday / This week / etc., searchable by title). Clicking a row rebuilds the transcript and drops you back into the active prompt, so you can keep going from where you left off.

7. Bad → good rewrites

Concrete examples of tightening a vague prompt into one the platform can answer cleanly:

❌ Vague: ambiguous everything

Tell me about mortgages

✅ Specific:

In HMDA 2024, what was the total origination volume of conforming
purchase loans nationwide, and which were the top 10 originators?

❌ Vague: no time scope, no source

Which lenders have the highest defect rate?

✅ Specific:

For Fannie + Freddie repurchases recorded in 2025, which sellers had the
highest defect-repurchase basis-point rate among lenders with 10,000+ loans
in the data?

❌ Vague: which kind of comparison?

Compare FHA and conventional

✅ Specific:

For the 2022 vintage, compare 60+ DPD delinquency rates between FHA loans
(in GNMA) and PMI-required conventional loans (in Fannie + Freddie),
observed at the latest available snapshot.

❌ Vague: implicit metric & scope

How is Rocket doing?

✅ Specific:

For Rocket Mortgage’s 2024 GSE originations, show me:
  - Total loan count and aggregate UPB
  - Average borrower FICO, LTV, DTI
  - Distribution by state (top 10)
  - 12-month EPD rate observed so far

❌ Vague: “recent”

Show me recent prepayment trends

✅ Specific:

For Freddie STACR 2020-vintage loans, show the annualized 60+ DQ rate
observed in 2025 by current note rate band (0-50 bps in-the-money,
51-100 bps, 101-200 bps, 200+ bps out-of-the-money).

8. What the platform can do today

The Mortgage LLM is improving rapidly — both the data behind it and the model itself. Today the chat surface supports four kinds of questions:

Aggregate data analytics. Natural-language queries over billions of HMDA, Fannie, Freddie, and Ginnie Mae loan-level records — counts, trends, distributions, market shares, denial rates, lender benchmarks, geographic flows, vintage performance. The chat writes the SQL, dry-runs it under a byte cap, executes it, summarizes the result, and shows the SQL it ran.
Loan-level scoring. Describe a loan in chat (FICO, DTI, LTV, occupancy, etc.) to get a calibrated probability against any of the models — Credit Approval / Denial, Pull-through, HPML, 12 / 24 / 36-mo Early Payment Default, GNMA EPD, Repurchase Risk, Prepayment, and Fannie-vs-Freddie Channel Choice. Each model carries its own risk-band recommendation and a calibration-quality reading.
GSE eligibility checks. Describe a loan and get a dual side-by-side Fannie + Freddie verdict against the codified Eligibility Matrix (Fannie) and Chapter 4203 (Freddie). The response shows every LTV / TLTV / FICO / DTI check each matrix ran, with Selling Guide section citations on any failures — useful for loan-officer triage before a Desktop Underwriter or Loan Product Advisor submission. See the GSE Eligibility Check use-case page for sample prompts and full responses including the two real Fannie / Freddie divergence cases.
Insights articles. Long-form analytical pieces published at /insights with interactive charts and per-piece methodology footnotes. Articles are cross-referenced from the chat (the chat can cite an Insights piece when a question maps directly to a published analysis).

Beyond those, today’s scope has a few well-defined edges worth knowing:

No PII or borrower-identifying data — ever. HMDA and GSE disclosure data are de-identified by design. Don’t paste loan numbers, SSNs, or any borrower-identifying information into prompts.
HMDA doesn’t currently include numeric FICO. CFPB redacts FICO from the Modified LAR. For credit scores, query the GSE tables (which include numeric FICO at origination) or use the hmda_lar_with_quality view’s quality proxies.
HMDA is annual today. No monthly or quarterly breakouts — for monthly trends, use GSE or GNMA. (Sub-annual HMDA support is on the roadmap.)
No jumbo or portfolio-held conventional yet. The GSE tables only contain Fannie/Freddie-purchased conventional loans. The platform can offer proxies (high-LTV PMI as a stand-in) and states the limitation clearly when it does.
Pre-2018 GNMA loan-level isn’t loaded yet. Monthly GNMA loan-level coverage currently starts at 2017-12. Pool-level data goes back to 2013-10 (use for pool / issuance analytics). The pre-2018 loan-level backfill is on the roadmap once the legacy fixed-width extractor lands.
Per-query scan budget: 25 GB today. Each query is capped based on on-demand pricing. If your question would exceed it, the platform tells you and suggests how to narrow. We’ll raise the cap over time.
Rate limit: 10 queries per hour per IP, currently. Heavy / batch workloads should use the scoring-model APIs directly. Per-user tiers with higher limits are on the roadmap.
The model can be wrong — verify the SQL. Every response shows the SQL it ran and the raw rows. Look at them. If the SQL doesn’t match what you asked, the summary won’t either.

9. Scoring-model prompts (loan-level)

Twelve loan-level scoring models are available today. Each one can be reached two ways: through the chat using natural language, or through the JSON-payload panel at the bottom of the home page for batch and power use. The chat path uses the same calibrated models under the hood — it translates a free-text loan description into the right JSON and routes to the right model.

The 12 models split into three input-schema families.

HMDA-application schema (4 models)

Credit Approval Probability
Credit Denial Probability
Pull-through (application → origination)
HPML (Higher-Priced Mortgage Loan, Reg Z)

All four share a 31-field payload covering loan type and purpose, occupancy, lien status, dollar amounts, banded DTI, geography, lender, and demographic disclosures. HMDA does not include numeric FICO — if you mention a FICO number in chat, it gets flagged as unsupported for these four models. Use the GSE family below when you need a numeric-FICO model.

GSE-acquisition schema (8 models)

Repurchase Risk (v4)
Early Payment Default at 12-, 24-, and 36-month horizons
Prepayment at 12-, 24-, and 36-month horizons
Appraisal Waiver Probability (PIW / Value Acceptance / ACE)
Fannie-vs-Freddie Channel Choice

All seven share a common payload built from Fannie / Freddie acquisition records — numeric FICO, origination LTV, note rate, seller name, and so on. Once you have a payload for one model, the same JSON works for the others; just switch model via the dropdown (or via natural language in chat). Channel Choice has a different output shape: per-GSE probabilities, expected losses, and a 12 / 24 / 36-month delinquency-curve comparison. It also accepts per-request overrides for the per-event loss assumptions (defaults: $200K per repurchase, $10K per EPD).

GNMA loan-level schema (1 model)

GNMA EPD — FHA / VA / USDA-RD / PIH loans, 12-month Early Payment Default

Different field names from the GSE schema — credit_score (not borrower_fico), ltv (not original_ltv), plus a required agency code (F FHA, V VA, R USDA, N PIH). The chat translates between “FHA” and the right agency code automatically; the JSON panel’s example loader knows the right shape if you pick GNMA EPD before loading.

Examples of well-formed scoring prompts. Each example below shows how to phrase a scoring request in natural language so the classifier routes to the right model and extracts the relevant features. The patterns are simple: start with the scoring intent (“Score this for X”, “What is the X probability?”, “Will this loan get X?”), name the loan type / channel up front, then list the loan and borrower features in any order.

HMDA-application family (Approval, Denial, Pull-through, HPML):

“What’s the approval probability for this application? Conventional purchase, primary residence, 1 unit, $380K loan on $400K value, DTI 36.”
“Score this for credit denial probability: FHA purchase, 1-unit primary, $285K loan on $300K value, DTI 48, income $62K.”
“What is the pull-through probability for a VA application? FICO 650, $350K loan, $75K income, owner-occupied first-lien purchase.”
“Score this loan for HPML probability: conventional refinance, primary residence, 1 unit, $300K loan on $375K value, DTI 42.”

GSE-acquisition family (Repurchase, EPD 12/24/36-mo, Prepayment 12/24/36-mo, Appraisal Waiver, Channel Choice):

“Will this loan get an appraisal waiver? Conventional rate-and-term refi, primary residence, 1 unit, $300K loan on $500K value, FICO 760, DTI 32.”
“Score repurchase risk for a conforming loan: FICO 720, DTI 35, LTV 80, note rate 6.5%, Fannie pathway.”
“Score this loan for 12-month prepayment risk: FICO 760, original LTV 70, note rate 7.5%, Fannie pathway.”
“What is the 12-month EPD risk for a high-LTV loan? FICO 700, DTI 40, LTV 95, note rate 7.0%, Fannie pathway.”
“Should we deliver this loan to Fannie or Freddie? FICO 720, DTI 38, LTV 80, note rate 6.5%.” (returns the channel-choice comparison with per-GSE expected loss and 12 / 24 / 36-month delinquency curves)

GNMA family (Government EPD — FHA / VA / USDA / PIH):

“Score 12-month EPD risk for an FHA loan: credit score 660, LTV 95, DTI 42, note rate 7.0%.”
“What’s the early-default probability for a VA loan with FICO 680, LTV 100, DTI 38?”

What makes these prompts work. They (i) lead with the scoring verb, so the classifier doesn’t mis-route to SQL; (ii) name the program / channel (conventional / FHA / VA / Fannie pathway) early, which anchors the model family; and (iii) include 5 to 8 features covering the key signals (FICO, DTI, LTV, loan amount and / or value, loan purpose, occupancy, note rate where applicable). Missing fields are fine — the models route NaN natively and just absorb more uncertainty into the prediction.

Pitfalls to avoid:

Don’t mix HMDA and GSE features. HMDA doesn’t carry numeric FICO; the GSE schema doesn’t carry banded DTI strings. If you mention “FICO 700” the classifier knows to route to the GSE family (or to GNMA for FHA / VA loans). If you mention “DTI >60” (a HMDA-style band) it knows to route to a HMDA model.
Don’t bury the scoring verb. “I’m looking at a loan that’s FICO…” reads ambiguously; “Score this loan…” or “What’s the X probability for…” routes cleanly.
Skip the lender unless the analysis requires it (see “Lender name is optional” below).
Don’t paste full loan-application JSON into chat. The chat path is for natural-language extraction; for batch scoring or finer-grained feature control, use the JSON-payload panel at the bottom of the home page or POST directly to the model’s endpoint.

Tips that apply across all three families:

Click “load example” in the JSON panel for a starting payload with valid field names — faster than typing from scratch.
Missing fields are fine. The models route NaN natively; field names matter more than completeness. A loan profile with 6 to 8 specified features still returns a calibrated probability; the prediction just absorbs more uncertainty from the missing inputs.
Lender name is optional. Each model accepts the originating lender as a feature, but it isn’t required. If you omit it, the model uses the population-baseline pattern — the right answer for any LO at a small or regional lender. If you include it, the model scores against that specific lender’s training-data pattern, which is useful for analysts benchmarking a competitor or for wholesale teams analyzing their own channel. Do NOT borrow a major lender’s name (UWM, Rocket, Chase, etc.) if you’re not at that lender — the score it returns will reflect that lender’s channel mix, not yours. The sample prompts on the home page omit the lender for this reason.
When you DO include a lender, casual forms are normalized. “Rocket Mortgage”, “UWM”, and “Chase” map to their canonical training forms automatically, so you don’t need to write the full legal entity name in chat.
Every model is isotonic-calibrated. The probability returned matches the historical empirical rate for similar loans, not a raw classifier output. Calibration is what makes the probability usable for reserve setting and pricing math, not just rank-ordering.
Approval and Denial are sibling models, not complements. They score the same HMDA application universe with opposite-direction targets; their probabilities do not sum to 1.0 because the universe includes withdrawn and incomplete files where no decision was rendered (typically 15 to 20 percent of HMDA applications). Use Denial for risk-management and fair-lending workflows; use Approval for loan-officer triage and consumer-facing positive framing.
Every model returns a risk band (P99, P95, P90, or BASE) plus a one-paragraph operational recommendation in plain English. The recommendation is model-specific: the action a P99 denial signals is different from a P99 prepay signal.