How to Track AI Visibility in Google Sheets (A Free DIY Method)

You can track AI visibility in Google Sheets before you pay for any dedicated tool, and for a small site it is a reasonable place to start. AI visibility is how often ChatGPT, Perplexity, Gemini, and other answer engines mention or cite your brand when someone asks a question in your category. A spreadsheet will not automate the hard parts, but it will give you a structured, repeatable baseline, which is far better than checking once and guessing.

This guide lays out a practical Google Sheets workflow: the columns that matter, how to sample prompts so your numbers mean something, a light way to semi-automate the checks with Apps Script, and the point where a spreadsheet stops being enough. The goal is honesty about what a DIY method can and cannot do, so you spend your effort where it pays off.

Set Up the Tracking Sheet

The sheet is only as useful as its structure, so design the columns before you collect a single data point.

Prompt. One row per question a customer would actually ask without naming you, such as "best project management tool for agencies." Avoid branded prompts; of course an engine can describe you when you name yourself.

Engine. ChatGPT, Perplexity, Gemini, Copilot, Claude. Each engine retrieves and cites differently, so a single combined number hides the truth. Give each engine its own row per prompt.

Run date. Because answers drift, you will sample the same prompt repeatedly. Date every check.

Mentioned (1/0). Did your brand appear at all in the answer.

Position. If mentioned, were you named first, mid-list, or in passing. A simple 1 to 3 scale works.

Competitors cited. List the brands that showed up instead. This column becomes your most valuable asset over time, because it tells you who the engines currently trust more than you.

Notes. Anything unusual: a wrong claim about you, a surprising source, a feature snippet.

Add a summary tab with a pivot table that computes mention rate per engine (mentions divided by checks) and a running share of voice against the competitors you keep seeing. That pivot is your dashboard.

Sample Prompts So the Numbers Mean Something

The single biggest mistake in DIY tracking is checking each prompt once. AI answers are non-deterministic, so one response is an anecdote, not a measurement. A disciplined sampling method is what separates a useful sheet from a misleading one, and the same principles apply whether you use a spreadsheet or software, as covered in how to measure AI visibility.

Freeze a prompt panel. Pick 20 to 50 representative prompts and stop changing them. A drifting list makes trends meaningless.

Run each prompt several times across different days. Your metric is the share of runs in which you appear, averaged, not a one-off yes or no.

Use a clean session each time. Logged-in chat histories personalize answers and contaminate your reading. Use a fresh or signed-out session so you measure the engine, not your own footprint.

Keep the run count constant. If you sample each prompt five times this week, do five next week too. Consistency is what lets you compare periods.

Semi-Automate With Apps Script (Optional)

Manual checks get tedious past a handful of prompts, so you can reduce the typing with a little code. Google Apps Script (Extensions, then Apps Script) can call an LLM API and write the response back into a cell, which removes the copy-paste step.

The realistic version. You can script calls to a model API that supports web search, pass each prompt, and log the returned text plus whether your domain appears. A simple IF(ISNUMBER(SEARCH("yourdomain.com", A2)), 1, 0) formula then flags mentions automatically.

The honest limitations. An API call is not identical to the consumer ChatGPT or Gemini app a real user sees; retrieval, system prompts, and grounding differ. Perplexity and the in-app experiences are hard to replicate exactly through an API. So treat scripted results as directional and spot-check against the real apps. You are also responsible for API costs and for respecting each provider's terms.

This is the stage where most people realize the automation they are rebuilding is the actual product. Sampling many prompts across several engines repeatedly, normalizing the results, and tracking competitor share is exactly what bing.ly does, which is why a spreadsheet is a fine baseline but a poor long-term home.

When a Spreadsheet Stops Being Enough

A sheet is the right tool until it is not, and the boundary is predictable.

You add engines and prompts. Five prompts across two engines is manageable. Forty prompts across five engines, sampled weekly, is hundreds of checks you do not want to do by hand.

You need trend charts and share of voice. Pivot tables can do basic trends, but stakeholder-ready reporting (annotated trend lines, competitor share over time) is painful to maintain manually.

You want the why, not just the whether. A sheet tells you that you were not cited. It does not tell you what the model understood your page to be about or what to fix. Pairing measurement with diagnosis is the point of a structured audit, as in how to run an AI visibility audit.

If you are tracking one site occasionally, stay in Sheets. If you are tracking regularly, across engines, or for clients, the manual cost quickly exceeds the price of a tool.

Frequently Asked Questions

Q: Can I really check ChatGPT visibility from Google Sheets? Partly. You can log manual checks in a sheet, and with Apps Script you can call an LLM API and flag whether your domain appears. But an API is not identical to the consumer app, so scripted results are directional and should be spot-checked against the real ChatGPT, Perplexity, and Gemini interfaces.

Q: How many prompts and runs do I need? Aim for 20 to 50 frozen prompts that represent your category, and sample each several times across different days. Your trusted metric is the share of runs you appear in, averaged, not a single response. Keep the prompt panel and run count constant so periods are comparable.

Q: What should I track besides whether I was mentioned? Position or prominence (first, mid, or passing), the competitors cited instead of you, and any inaccurate claims the model made. The competitor column is especially valuable, since it reveals which sources the engines currently trust in your category.

Q: Is a spreadsheet good enough long term? For a single site checked occasionally, yes. Once you add multiple engines, dozens of prompts, weekly sampling, or client reporting, the manual workload outgrows a sheet and a dedicated tool becomes cheaper than your time.

Getting Started

Build the sheet today: one tab for raw checks (prompt, engine, date, mentioned, position, competitors, notes) and one summary tab with a pivot for mention rate and share of voice. Freeze a 20-prompt panel, sample each prompt a few times this week across the engines your audience uses, and record everything. Once the manual effort starts to hurt, point bing.ly at the same panel to automate the sampling and trend tracking, and keep the sheet as your archive.