AI chatbots kept handing users fake or dead login URLs

"What Is the Login URL?" Should Be an Easy Question

One of the more depressing side effects of AI becoming a default interface layer is that people now ask chatbots things they used to ask a browser. "What is the official login page for Wells Fargo?" "Can you tell me where to sign in to this service?" "My bookmark is broken. What's the real URL?"

Those are not hard questions in the human sense. They are narrow, factual lookups. The answer should be either correct or withheld. But Netcraft's July 2025 research showed that large language models treated them as an opportunity for improvisation.

Netcraft asked a GPT-4.1-family model for login URLs for 50 major brands. Across those tests, the model returned 131 hostnames. Only about two-thirds of them were actually correct. The remaining third were the dangerous part: roughly 29% were unregistered, parked, or otherwise inactive, and another 5% belonged to unrelated businesses. Netcraft's framing was blunt and accurate. More than one in three suggested destinations were not controlled by the brand the user had asked for.

Wrong in Exactly the Way Scammers Need

A normal hallucination is bad enough. A hallucinated login URL is better understood as target acquisition for a phisher.

If a chatbot invents a plausible but unregistered domain for a popular brand, an attacker can register it. If the model points to an existing but unrelated domain, that is at least confusing and at worst an opening for impersonation. If it happens to surface a known fake page, the model has skipped the setup phase and gone straight to delivering a trap.

The Wells Fargo example made the risk easy to grasp. As The Register reported from Netcraft's findings, one test asking for the bank's login URL caused ChatGPT to produce a well-crafted fake page that had already been used in phishing campaigns. That is not just a model being a little off on an address. That is a chatbot sending a user toward credential theft while sounding helpful.

Search Engine Journal's write-up highlighted another version of the same failure: when asked for official login pages, models can return a mix of correct domains, dead domains, and scam-adjacent lookalikes with no reliable signal to the user that anything is wrong. The confidence of the answer becomes part of the problem. People ask a chatbot because they want ambiguity removed. The model removes it by sounding certain.

This Failure Mode Has a Business Model Attached

Netcraft's data matters because it turns the issue from an anecdote into an attack surface. Once wrong URLs show up often enough, they become monetizable. Criminals do not need every query to fail. They only need predictable failures around valuable brands or popular prompts.

The Register quoted Netcraft's Rob Duncan on the obvious next step: if attackers can see what wrong hostname the model tends to generate, they can register it and build a phishing site there. That turns a model error into an intake funnel. Instead of waiting for users to mistype a domain, the attacker lets the AI do the typo generation at scale.

There is an even uglier variation, and Netcraft had already seen signs of it. Threat actors can flood the web with content designed to shape what models retrieve or infer. In related work, Netcraft described a fake Solana blockchain interface backed by GitHub repositories, tutorials, Q&A pages, and fake social accounts meant to make the bogus resource appear legitimate to AI systems. The same basic logic applies to login pages. If models are weak at source verification and strong at pattern completion, scammers will happily provide the patterns.

Why Chatbots Fail at URLs

This is one of those incidents where the underlying technical explanation is less impressive than the product decision. Large language models do not understand domains the way a browser, password manager, or carefully built search system can. They are predicting text. They can notice that "bank," "signin," "secure," and a brand name often occur together. That does not mean they are verifying ownership or checking whether the hostname is controlled by the actual institution.

Humans are not perfect at this either, which is why phishing works. The difference is that a human giving a friend directions to a website is usually expressing uncertainty. A chatbot tends to compress uncertainty into a polished sentence. It offers the answer as if it has checked.

Providers often market these systems as assistants that reduce friction. That framing is exactly what makes this failure severe. A user whose bookmark broke might think they are doing the cautious thing by asking the AI instead of clicking a random search result. In Netcraft's test conditions, that caution could move them from ordinary search risk into model-generated phishing risk.

This Is an AI Product Failure, Not Just a Security Story

The temptation with stories like this is to describe them as ordinary phishing risk with AI somewhere in the loop. That undersells the mechanism. The problem here is not that scammers used AI to write better lures. It is that consumer AI products themselves produced the bad directions.

That matters for scope. The site is not cataloging criminals using tools competently. It is cataloging systems that fail in ways that create harm. A chatbot that invents or misroutes login URLs is failing at a direct user-facing task. The failure is concrete, reproducible, and connected to a well-understood class of harm: credential theft.

It also scales in an unpleasantly elegant way. Traditional phishing usually requires the attacker to reach the user first by email, text, ads, or search poisoning. Here the user can walk into the trap by voluntarily asking the model for help. The AI is not merely vulnerable to poisoning in the background. It is actively mediating trust at the moment the user wants to authenticate with a bank, utility, retailer, or tech platform.

The Boring Alternative Was Better

There is a product lesson here that many AI companies keep relearning. Some tasks should not be answered by free-form generation unless there is a hard verification layer underneath. Login URLs fall into that category. If a model cannot confirm the domain against a trusted source, the correct response is not a plausible guess. It is "I cannot verify that safely."

Browsers, bookmarks, password managers, verified app directories, and even old-fashioned search results with clear domain display all have flaws. But at least they are not pretending to know the answer by assembling one from token probabilities. Replacing those systems with conversational certainty is not progress.

Netcraft's findings gave this issue numbers, examples, and a clear abuse path. A third of suggested hostnames were not brand-controlled. One Wells Fargo result surfaced a phishing page. Attackers could register dead domains or poison future outputs. That is enough to move the story from "AI has quirks" into something more useful: a documented case where chatbots turned a basic security question into an own goal.

Vibe Graveyard

AI chatbots kept handing users fake or dead login URLs

Incident Details

Tech Stack

References