Understanding Your AI Visibility Score
Bingly distils your site's AI citation performance into a single 0–100 score. Here is exactly what it measures, how it is calculated, and how to interpret it.
What is the visibility score?
The Bingly visibility score is a single number between 0 and 100 that answers one question: when a user asks an AI model about a topic related to your keyword, how likely is it that your website gets cited?
Traditional SEO rank tracking tells you where your page appears in a list of blue links. AI visibility tracking is different — there is no ranked list. A model either includes your site in its answer or it does not, and if it does, it may feature you prominently or bury you in a footnote. The score captures both dimensions.
A score of 100 means every model you tested cited your site as the primary reference for this keyword in every run. A score of 0 means no model mentioned your site at all.
How the score is calculated
The aggregate score is a weighted average across all models you selected, combining two components:
Citation Rate
The fraction of model runs in which your domain was mentioned at all. Because LLM outputs are non-deterministic, Bingly sends each prompt multiple times and averages the results.
Weight: 60%
Prominence
How prominently the citation appeared when it did occur — primary reference, secondary list item, or incidental mention. Being the first source named in a direct answer scores higher than appearing fifth in a bullet list.
Weight: 40%
Each model contributes equally to the aggregate unless you have applied custom model weights in your account settings. The final score is normalised to a 0–100 scale.
Score ranges explained
Use these bands as a guide for prioritising your optimisation work. They reflect thresholds observed across thousands of Bingly checks — not arbitrary cutoffs.
Your site is being cited consistently and prominently by most models. AI answer engines treat your content as an authoritative source for this keyword. Focus on maintaining this position and expanding to related queries.
Your site appears in some responses but not all — and when it does, it may not be in the most prominent position. There is clear room to improve. Work through Bingly's high-priority recommendations to close the gap.
AI models are largely not citing your site for this keyword. This could mean your content is too thin, lacks clear entity definitions, or isn't structured in a way LLMs can extract and reference reliably. Treat the recommendations as a starting point.
What "prominence" means
Not all citations are equal. When an AI model answers a question, sources mentioned first — or named as the definitive answer — carry far more weight than sources buried in a long list. Bingly tracks the position and context of every citation and converts it to a prominence multiplier.
| Level | Multiplier |
|---|---|
| Primary citation | 1.0× |
| Secondary mention | 0.7× |
| Incidental mention | 0.3× |
| Not mentioned | 0× |
Improving prominence — moving from a secondary mention to a primary citation — has a larger impact on your score than merely increasing the number of models that mention you. One strong primary citation beats five incidental mentions.
Interpreting the per-model breakdown
The aggregate score tells you your overall position, but the per-model breakdown is where the actionable insights live. Each model has its own panel showing:
- Citation status — Cited / not cited for this run, plus a run-average citation rate.
- Prominence level — The highest prominence achieved across sampled runs for this model.
- Competitors cited instead — Which other domains the model mentioned when yours was absent — useful for competitive intelligence.
- "How the model sees your page" — The model's own characterisation of your content: what it thinks the page is about, what topics it associates with your domain, and what gaps it identified.
Look for patterns across models. If ChatGPT and Gemini both cite you but Claude does not, that suggests a training-data or content-structure issue specific to how Anthropic's models index your domain — not a universal problem. Address the universal issues first (those flagged by all models), then work on model-specific gaps.
Why scores vary between runs and models
Large language models are non-deterministic: even with the same prompt, they do not produce identical responses every time. This is by design — the "temperature" parameter that controls response variety is what makes LLMs useful for open-ended tasks.
For AI visibility tracking, this means a single check is a data point, not a verdict. Bingly mitigates this by:
- Sampling each model multiple times per check and averaging the citation rate.
- Flagging when a result is based on a small sample so you can interpret it accordingly.
- Storing all historical checks so trends over time are more reliable than individual snapshots.
Different models also differ in training data, knowledge cutoffs, and citation behaviour. A model trained primarily on academic text will cite differently from one optimised for conversational search. This is not noise — it is real signal about which AI surfaces your audience actually uses.
If you see a large swing (10+ points) between two consecutive weekly checks without making any content changes, this is likely natural LLM variance rather than a real ranking shift. The trend over 4–6 weeks is the reliable signal.
Next steps
Questions? [email protected]