Frequently Asked Questions

Honest answers about what SentimentWiki is, how it works, and what it isn't yet.

Why all this effort?

The inversion catalog is the visible part. The real goal is to build the training data flywheel for domain-specific, per-asset financial NLP models โ€” one fine-tuned adapter per security, trained on community consensus labels. Every headline labeled, every inversion confirmed, every phrase annotated is a data point toward a model that genuinely understands what news means for a specific market. The catalog is how we get there.

What is it?

What is SentimentWiki?

An open, community-maintained catalog of financial sentiment inversions โ€” phrases where a generic NLP model predicts the wrong direction for a specific asset. The catalog powers a free public API that returns asset-specific sentiment (direction, magnitude, relevance, reasoning) for any headline you submit.

What is a sentiment inversion?

A phrase where the naive sentiment direction is wrong for a specific asset. "OPEC cuts output" reads as negative (cutting = bad). For crude oil it's bullish โ€” less supply means higher prices. Generic models don't know the asset. The catalog does.

Why did you build this?

I kept seeing FinBERT misread crude oil headlines in ways that were obvious to anyone who understands the market. The fix isn't a better generic model โ€” it's a catalog of what phrases mean for each specific asset. I figured the community was better positioned to build that than any single model.

How does the API work?

How does the inference work?

Two layers: Claude Haiku provides base sentiment, then re-evaluates with the inversion catalog for your specific security injected as context. The catalog tells it what phrases mean for that asset โ€” so "inventory draw" maps to bullish for OIL even though it sounds negative.

Is the API free?

Yes โ€” 100 requests/day for anonymous users, no signup required. Above that, contact us directly. There's no self-serve paid tier yet. At this stage I want to talk to anyone who needs more than 100/day.

What securities are supported?

35+ assets across energy (OIL, NATGAS, LNG, BRENT), metals (GOLD, SILVER, COPPER, PLATINUM, PALLADIUM), agriculture (WHEAT, CORN, SOYBEANS, SUGAR, COFFEE, COTTON), forex (EURUSD, GBPUSD, USDJPY, USDCAD, USDCHF, AUDUSD), crypto (BTC, ETH), equity indices, and macro. See the full catalog.

What happens if I submit a security not in the catalog?

Returns a 404. We don't do generic sentiment for out-of-catalog assets โ€” the inversion awareness is the value, and without a catalog entry we can't provide it accurately. You can request a new security.

The catalog

How do you know the catalog entries are correct?

Multi-layer consensus: AI-generated hypotheses are seeded first, then community members confirm or reject them. A hypothesis requires 3+ confirms at a 2:1 confirm/reject ratio to become active. Maintainers can lock consensus entries. New accounts go through a labeling CAPTCHA on signup. That said โ€” the community is young and most entries are still hypotheses awaiting human validation. That's partly why we launched publicly.

Is the catalog open?

Yes โ€” CC BY 4.0. Available on GitHub and HuggingFace. The catalog stays open regardless of what happens to the platform.

What stops someone from polluting the catalog with bad data?

The consensus threshold makes it expensive โ€” you need 3+ independent confirms at 2:1 ratio. Maintainers review and can lock entries. Abnormal voting patterns get flagged. It's not bulletproof, especially with a young community, but coordinated attacks are costly relative to what an attacker gains. We'll harden this as the community grows.

Business & roadmap

How do you make money?

Not yet. The catalog needs depth before the API is worth paying for. The model is open catalog โ†’ free API tier โ†’ paid tiers for high-volume commercial use once the per-security model adapters are ready. The catalog stays open regardless.

Won't a well-funded competitor just copy this?

They can copy the infrastructure in a week. They can't copy community consensus from domain experts across 35 asset classes. Every label submitted makes the next model better, which attracts more contributors. The catalog is the moat, not the code.

What's the roadmap?

Short term: deepen the catalog, grow the contributor community, publish an accuracy benchmark. Medium term: LoRA fine-tuned adapters per security, trained on community consensus labels โ€” one small model per asset, fully self-hostable. Long term: paid API tiers for high-volume users, Python SDK, arXiv benchmark paper.

Are you using Claude/Anthropic? What happens if they change pricing?

Claude Haiku is the current inference layer, not the endgame. The roadmap is LoRA fine-tuned adapters per security โ€” fully self-hostable, no API dependency. Haiku is cheap enough right now that it's not a business risk at current traffic levels. If pricing becomes a problem, there are open alternatives. The catalog is the asset, not the inference engine.

Contributing

How do I contribute?

Three ways: label headlines in the label queue, highlight phrases in the articles tab of any security, or vote on inversion hypotheses on any security page. No domain expertise required โ€” if you know what a headline means for a market, your label is valuable.

Do I need an account?

For the API, no โ€” anonymous up to 100/day. For labeling and voting, yes โ€” creates a contributor record so your labels build reputation over time.

Something not answered here? Email multidude@sentimentwiki.io or open an issue on GitHub.