Show HN：LLM 健全性檢查 – 避免過度設計 AI 技術堆疊的實用指南

Hacker News

大約 1 個月前

AI 生成摘要

這篇 Hacker News 的「Show HN」文章介紹了一個 GitHub 儲存庫，提供實用的指南和決策樹，協助開發者透過為任務選擇合適的 LLM 模型大小，來避免過度設計其 AI 技術堆疊。

GitHub - NehmeAILabs/llm-sanity-checks

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

To see all available qualifiers, see our documentation.

License

NehmeAILabs/llm-sanity-checks

Folders and files

Latest commit

History

Repository files navigation

LLM Sanity Checks

A practical guide to not over-engineering your AI stack.

Before you reach for a frontier model, ask yourself: does this actually need a trillion-parameter model?

Most tasks don't. This repo helps you figure out which ones.

The Decision Tree

Quick Checks

Check 1: Can you describe the task in one sentence?

If yes → probably a small model task.

If no → you might have an architecture problem, not a model problem.

Check 2: What's your accuracy requirement?

Scaling to frontier models rarely buys you more than 5% accuracy on simple tasks. That 5% costs 50x more.

Check 3: How many output tokens do you need?

Output tokens are the bottleneck. They determine latency and cost.

The JSON Tax

Everyone defaults to JSON for structured output. But JSON has overhead:

For simple extraction tasks:

When to use JSON: nested structures, optional fields, API contracts.
When to use delimiters: simple extraction, high-volume pipelines.

Model Selection Cheat Sheet

Tiny (1B-4B params)

Best for: classification, yes/no, simple extraction

Small (8B-17B params)

Best for: most production tasks, RAG, extraction, summarization

Medium (27B-70B params)

Best for: complex reasoning, long context, multi-step tasks

Frontier (100B+ dense params)

Best for: novel tasks, complex reasoning, when nothing else works

Before you use these, ask: have you tried a smaller model?

Anti-Patterns

❌ "We use GPT-5 for everything"

That's not a flex. That's a $50K/month cloud bill waiting to happen.

❌ "We need the best model for our enterprise customers"

Your enterprise customers care about latency, reliability, and cost. Not model prestige.

❌ "Small models aren't accurate enough"

Did you test? With the right prompt? On your actual data?

❌ "We'll optimize later"

You'll optimize never. The technical debt compounds. Start right-sized.

❌ "JSON output is industry standard"

For simple extraction, it's industry waste. See: The JSON Tax.

❌ "We need RAG for our documents"

For small document sets? No you don't.

Context windows are now 2M-10M tokens. That's thousands of pages. If your knowledge base is <100 pages, just stuff it in context. Preprocess, convert to markdown, include directly.

RAG adds complexity: chunking strategies, embedding models, vector databases, retrieval tuning, reranking. All that infrastructure for documents that fit in a single prompt.

When RAG makes sense:

When to skip RAG:

Patterns

✅ Cascade Architecture

Start with smallest model. Verify output. Escalate only on failure.

Verifier can be: format validation, a classifier, or FlashCheck for grounding checks.

See examples/cascade.py for a working extraction example.

✅ Task-Specific Models

One model per task type, sized appropriately.

✅ Measure First, Scale Never

Before adding a bigger model:

✅ Simple Tools Over Browser Automation

For research tasks, don't reach for computer use or Puppeteer.

Three tools. No browser. No screenshots. No vision model.

Browser automation is only for: login walls, dynamic forms, actions (booking, purchasing).

Show HN: LLM Sanity Checks – A Practical Guide to Avoiding Over-Engineering Your AI Stack

GitHub - NehmeAILabs/llm-sanity-checks

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

NehmeAILabs/llm-sanity-checks

Folders and files

Latest commit

History

Repository files navigation

LLM Sanity Checks

The Decision Tree

Quick Checks

Check 1: Can you describe the task in one sentence?

Check 2: What's your accuracy requirement?

Check 3: How many output tokens do you need?

The JSON Tax

Model Selection Cheat Sheet

Tiny (1B-4B params)

Small (8B-17B params)

Medium (27B-70B params)

Frontier (100B+ dense params)

Anti-Patterns

❌ "We use GPT-5 for everything"

❌ "We need the best model for our enterprise customers"

❌ "Small models aren't accurate enough"

❌ "We'll optimize later"

❌ "JSON output is industry standard"

❌ "We need RAG for our documents"

Patterns

✅ Cascade Architecture

✅ Task-Specific Models

✅ Measure First, Scale Never

✅ Simple Tools Over Browser Automation

More Patterns

Tools

RightSize

FlashCheck

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Footer

Footer navigation