Dear CIO,

Chris Roberts, whom I featured in my book Deming’s Journey to Profound Knowledge, recently shared a post that hit on something many security leaders are dealing with right now. How do you know whether a vendor is actually good, or whether they are just riding the latest wave of security marketing? The cybersecurity industry has long struggled with overpromising. Every vendor claims better visibility, faster response times, reduced risk, fewer alerts, stronger compliance, simpler operations, and, now, of course, ā€œAI-poweredā€ everything. For leaders in regular companies, healthcare, manufacturing, finance, local government, the midmarket, and smaller teams without unlimited staff, this creates a real problem. They are not just buying tools. They are making risk decisions with limited time, limited budget, limited people, and a market full of vendors who often sound very similar.Ā 

Best Regards,
John, Your Enterprise AI Advisor

Dear CIO

Asking Better Vendor Questions

Guide for Evaluating AI and Agentic Systems in the Enterprise

That is why I think Chris’s idea is a good one. He is talking about building practical sets of questions leaders can ask vendors across areas like EDR, XDR, MDR, SIEM, SOAR, IAM, PAM, cloud security, GRC, threat intelligence, SASE, backup, third-party risk, cyber insurance, managed services, and the growing pile of ā€œAI-poweredā€ security products. That is useful work. And to be clear, Chris knows this space. I respect his domain knowledge here. My view is that his list is a strong starting point, but the AI and agentic shift gives us an opportunity to expand it. The questions we asked vendors five or ten years ago are still useful — they are just no longer sufficient.

The Traditional Vendor Questions Still Matter

We still need to ask vendors the fundamentals:

  • What do you actually do?

  • What problem do you solve?

  • What telemetry do you need?

  • What systems do you integrate with?

  • What does deployment look like?

  • What does support look like?

  • What does success look like after 30, 60, or 90 days?

  • How do you prove the product is working?

  • What happens when something breaks?

Those questions are not outdated. They still matter because most organizations are still buying in recognizable categories: endpoint, identity, cloud, email, compliance, vulnerability management, detection and response, managed services, backup, and so on. So yes, Chris’s original list is relevant. But AI and agentic systems change the conversation.

The AI and Agentic Shift

A traditional security tool usually observes, alerts, blocks, reports, or enforces a known policy. That was not always simple, but the model was familiar. Now we are seeing tools that summarize, prioritize, classify, recommend, generate, automate, and in some cases act. That is a different thing. A product may not just tell an analyst what happened. It may decide what matters. It may not just open a ticket. It may enrich it, route it, recommend the fix, trigger a playbook, close the loop, and write the executive summary. It may not just detect a risky identity. It may recommend disabling the user, revoking a session, rotating credentials, or changing access. It may not just identify a cloud misconfiguration. It may offer to remediate it. That means the evaluation changes. The question is no longer only: Is this vendor good? The better question is: What authority are we giving this system? What can it influence, what can it change, and who owns the outcome when it is wrong? That is where I think Chris’s list can be expanded.

The Old Rules Are Not Gone. They Need to Stretch.

I have heard people say that the old IT security rules no longer apply. I understand the point, but I would say it a little differently. The old rules still matter. Least privilege still matters. Logging still matters. Change control still matters. Segmentation still matters. Identity governance still matters. Incident response still matters. Backups still matter. Vendor risk still matters. But those rules were mostly built around people, applications, infrastructure, networks, and cloud services. Now they also need to cover models, prompts, agents, memory, retrieval systems, tool calls, API permissions, automated workflows, and generated recommendations. That is the change. The old rules are not dead. They are incomplete.

What I Would Add to the List

I would keep the original categories, but I would add a universal AI and agentic layer at the top. Any vendor using language like AI-powered, autonomous, copilot, agent, self-healing, intelligent remediation, AI analyst, automated SOC, or adaptive response should have to answer a separate set of questions before we even get to the product category. Here are the areas I would add.

1. Delegated Authority

Once a system can act, we are no longer just evaluating software. We are evaluating delegated operational authority. Questions should include:

  • What is the system allowed to do?

  • Can it only observe?

  • Can it summarize or recommend?

  • Can it open or close tickets?

  • Can it suppress alerts?

  • Can it trigger response actions?

  • Can it isolate endpoints?

  • Can it disable accounts?

  • Can it modify firewall rules?

  • Can it change cloud configurations?

  • Can it rotate credentials?

  • Can it touch production systems?

2. Blast Radius

AI systems combines broad access to information, machine-speed execution, and delegated authority. That combination creates entirely new failure scenarios. Questions should include:

  • What is the worst thing this system can do if it is wrong?

  • How many endpoints can it isolate?

  • How many users can it disable?

  • How many alerts can it suppress?

  • How many systems can it modify?

  • How quickly can bad actions spread?

  • Can we stop it?

  • Can we roll it back?

  • Are sensitive actions rate-limited?

  • Do high-impact actions require approval?

3. Evidence Over Assertions

If we ask vendors, ā€œDo you reduce alert fatigue?ā€ the answer will be yes. If we ask, ā€œDo you use AI?ā€ the answer will be yes. If we ask, ā€œDo you integrate with our tools?ā€ the answer will probably be yes. But those answers do not tell us much. The better approach is to ask vendors to show the workflow, prove the claim, and demonstrate exactly how the product works in a real operating environment. The best vendor questions force demonstration, not marketing answers.

  • What data comes in

  • What the AI sees

  • What gets automated

  • What requires approval

  • What gets logged

  • What happens when recommendations are wrong

  • What analysts actually see

  • What evidence proves the system works

4. Data Rights

AI-era vendor evaluations require much deeper scrutiny of data usage. Questions should include:

  • Is customer data used to train models?

  • Is it used for tuning or evaluation?

  • Is it used to improve products for other customers?

  • Are prompts retained?

  • Are outputs retained?

  • Are embeddings created?

  • Can embeddings be deleted?

  • Is data shared with third-party model providers?

  • Can vendor staff access prompts, logs, alerts, or outputs?

  • What happens to customer data after contract termination?

5. Prompt Injection and Manipulation

If an AI system reads untrusted content, it can be manipulated by untrusted content. That content could be in emails, tickets, log entries, PDFs, webpages, Slack messages, Teams messages, threat reports, knowledge-base articles, or vulnerability descriptions. So we should ask:

Questions should include:

  • Can attacker-controlled content influence the system?

  • Does the system separate instructions from data?

  • Can malicious tickets override policy?

  • Can phishing content manipulate AI analysis?

  • Has the system been tested for prompt injection?

  • Can customers define forbidden actions?

  • Do risky actions require approval?

6. Human-in-the-Loop Clarity

Vendors love saying ā€œhuman in the loop.ā€ That phrase is becoming almost meaningless unless we define it. We must distinguish between 'nominal' and 'meaningful' human oversight. If a system generates 1,000 recommendations per minute, a human cannot 'approve' them in any real sense; they are simply rubber-stamping machine-speed decisions. We need to ask:

  • At what volume and velocity does the human loop actually break?

  • Where is the human before the action and after the action?

  • Only for critical incidents?

  • Only during business hours?

  • Is approval required, or just available?

  • What context does the human see?

  • Can the human override the recommendation?

  • Can the human roll back the action?

  • Is the human part of the customer’s team, the vendor’s team, or a third party?

7. Explainability and Auditability

If an AI system says an alert is low priority, a user is risky, a vulnerability is critical, or a control is satisfied, we want to know why. Not in hand-wavy terms. We need to be able to reconstruct the decision. ā€œThe AI decidedā€ is not an acceptable answer.

  • What input did it use?

  • What context did it retrieve?

  • What recommendation did it make?

  • Was there uncertainty?

  • Was there human approval?

  • What action was taken?

  • What systems were affected?

  • Can the decision be reviewed later?

  • Can the action be reversed?

8. Change Control for AI Behavior

In traditional IT, we care about software, configuration, and infrastructure changes. Now we also need to consider behavioral changes. A vendor may update a model, or use a public model that changes, change a prompt, modify scoring logic, alter an automation workflow, or adjust a playbook. Any of those changes could affect security operations.Ā  Governance must also account for 'under the hood' changes. If a vendor relies on a public third-party API, a model update or a change in the provider's weighting can cause 'model drift'. A prompt or playbook that worked yesterday may fail today because of an upstream update the vendor does not directly control. Ask: How do you shield our operations from third-party model updates? So we should ask:

  • How are model changes tested?

  • Are prompt changes reviewed?

  • Are automation changes versioned?

  • Are customers notified when behavior changes?

  • Can customers opt out?

  • Can prior behavior be reconstructed?

  • How do vendors shield customers from upstream model drift?

9. Failure Modes

Do not only ask vendors what their product does well. Ask them how it fails. The shift from deterministic logic to probabilistic models changes how systems fail. A traditional tool fails loudly by crashing or throwing an error. An AI system fails silently; it may confidently provide a wrong answer or an unsafe recommendation without any indication that it has deviated from the intended logic. We must ask:

  • What does the system commonly get wrong?

  • What assumptions does it make?

  • What happens when telemetry is incomplete?

  • What happens when sources disagree?

  • Can the system say ā€œI don’t knowā€?

  • Can it stop itself?

  • When should customers not trust the output?

  • What should customers not use the product for?

How I Would Reframe the Whole Exercise

I would add a front-end section that applies to every vendor claiming AI, automation, autonomy, or agentic capability. Something like:

Universal AI and Agentic Vendor Questions

  1. Where exactly is AI used in the product?

  2. What decisions does it influence?

  3. What actions can it take?

  4. What systems and data can it access?

  5. What permissions does it require?

  6. What is deterministic, and what is probabilistic?

  7. What requires human approval?

  8. What is fully autonomous?

  9. What gets logged?

  10. Can we reconstruct why it made a recommendation or took an action?

  11. What happens when it is wrong?

  12. What is the maximum blast radius?

  13. How do we constrain or disable automation?

  14. Can humans override and roll back actions?

  15. Is customer data used for training, tuning, evaluation, or product improvement?

  16. How do you defend against prompt injection?

  17. How are model, prompt, scoring, and workflow changes governed?

  18. What should customers not use this product for?

Then keep the original categories underneath that layer. That way, AI is not treated as one more product category. It becomes part of the way every product category is evaluated.

A Few Examples

For EDR, XDR, and MDR, the question is not only: How do you detect and respond to threats?

It becomes: What endpoint actions can your AI or automation initiate without human approval, and how do we constrain, audit, and reverse those actions?

For SIEM, SOAR, and log management, the question is not only: What integrations and playbooks do you support?

It becomes: Can your system generate, modify, or execute playbooks dynamically, and what prevents unsafe automation?

For identity, IAM, PAM, and ITDR, the question is not only: How do you detect identity risk?

It becomes: Can your system automatically disable users, revoke sessions, alter privileges, rotate credentials, or trigger step-up authentication, and what prevents overreaction?

For cloud security, the question is not only: What misconfigurations do you detect?

It becomes: Can your system remediate cloud configurations automatically, and how do we know it will not break production or violate architecture intent?

For GRC, the question is not only: What frameworks do you support?

It becomes: Does your AI generate control mappings, audit responses, or evidence summaries, and how do you detect hallucinations, outdated mappings, or unsupported claims?

Chris is absolutely right that security leaders need better vendor questions. His list is relevant, practical, and grounded in real buying decisions. My only addition is that AI and agentic systems force us to go one level deeper. A traditional vendor checklist asks: What does your product do? An AI and agentic vendor checklist has to ask: What can your product decide, what can it change, what can it break, how do we know, and who is accountable?

That is the learning opportunity. The next version should not replace the list. It should expand it. Because in this new era, we are not just evaluating tools. We are evaluating systems that may reason, recommend, and act inside the environment. That deserves a different class of questions.

Login or Subscribe to participate

Deep Learning
  • Artificial Intelligence didn’t start in Silicon Valley. It began with centuries of thinkers who refused to treat intelligence as something mystical. Inspired by Rebels of Reason, this live 8-part biweekly course (The second session is on May 20th) traces AI as a long intellectual journey, not a hype cycle, exploring how machines learned to count, reason, search, learn, and ultimately generate language through the ideas that made it possible. With no heavy math or prerequisites, it focuses on the breakthroughs that shaped modern AI, perfect for technologists, leaders, students, and anyone trying to understand what’s really behind today’s AI moment. Register here.

  • The Artificially Intelligent Enterprise looks at the impact of OpenAI’s Workspace Agents.

Regards,

John Willis

Your Enterprise IT Whisperer

Follow me on X

Follow me on Linkedin

Dear CIO is part of the AIE Network. A network of over 250,000 business professionals who are learning and thriving with Generative AI, our network extends beyond the AI CIO to Artificially Intelligence Enterprise for AI and business strategy, AI Tangle, for a twice-a-week update on AI news, The AI Marketing Advantage, and The AIOS for busy professionals who are looking to learn how AI works.

Keep Reading