26 Comments
User's avatar
Joseph B's avatar

My issue with ChatGPT is when you have to double check the AI’s work to make sure everything is accurate. At that point, why waste time using the generative AI in the first place? I’m glad that we have real human reporters such as yourself, Stephen! Unless you’re a robot in disguise. Surely you’d be upfront about that, of course.

Austin's avatar

"In fact it would likely land near the very top (possibly top 3)"

This is the thing I can't stand about these chatbots. Do you not know this? Is it in the top 3 or not? If so, just say that! The conversational chatty tone does a lot of legwork to cover up the fact that it's not actually smart, or as magical as OpenAI wants you to believe.

Vixolus's avatar

A few more trillion will correct this mishap

Eric's avatar

That said, ChatGPT correctly pointed out that Fandom pages are bad ;)

Michael Kelehan's avatar

I asked Gemini for a game that has four words in the title, each beginning with the letter "K" (for a 4K joke that's not important now). It told me there weren't any released ones that it knew about, but there was King K. Rool's Krazy Kartoon Kapers, a "well-known" canceled SNES game. I didn't believe that was a thing, so I asked for sources mentioning it. It said it was "well documented," and gave me three real-looking links. Every one of them was broken.

Not only did it hallucinate the game, it hallucinated entire sources on the spot. When I called it out on that, it apologized and said it failed.

In my mind, the danger isn't the lack of accuracy. It's the confidence in the wrong answers, and the fact that it won't say "I don't know" or even "maybe it's this, but I'm not sure" unless you prove that you know it's wrong.

Don's avatar

I'd ask that you don't start using ChatGPT. Its information as shown here is wrong, and it's fueled by plagiarism, environment destruction and many many other problems. In fact, if you are using it going forward, please tell me so that I can cancel my subscription because I avoid supporting generative AI work - which involves avoiding a number of games as well as different outlets and work.

Willektron's avatar

The tone throughout the article was very tongue-in-cheek, and I didn't get the impression that he was happy with the results. More of a "oh, alright, since everyone's bugging me about it, I suppose I'll see if it's as useful as they say." And it decidedly wasn't.

Don's avatar

Ya, I know but I'm hoping this isn't a case of him starting to experiment with it everywhere to see if its useful somewhere.

I have to spend enough time watching and writing about the BS stuff that I don't want to see it here.

Stephen Totilo's avatar

You don't have anything to worry about, though I appreciate you expressing your concern and being clear where you stand.

As I mentioned in the article, I've been skeptical that ChatGPT could help with my reporting. When it's been recommended to me, I've assumed that any well-meaning encouragement was potentially misguided and that the tech couldn't meet my standards of accuracy. Worse, I worried it would simply regurgitate others' work, potentially poorly, potentially without credit (or compensation) and with errors.

Thus, this piece. In my email intro to readers (not visible to people reading online, I realize), I explained that this was an unplanned edition:

"As of an hour ago, I wasn’t planning to send out an edition of Game File today. I was deep into working on Thursday’s newsletter.

"But something just happened that was too absurd not to share."

This afternoon, with an idea in my mind (Was this new Mario Wonder title the longest Nintendo game title ever?) and without enough time in the day to confirm it, I remembered the urging I'd gotten from some people to try to use ChatGPT to help me out with research.

So, to test that theory, I asked ChatGPT about a specific webpage. Notably, I was already asking it to do a task that involved a specific, known webpage of work, as I don't think I'd even have had this instinct if the idea I had would have involved ChatGPT producing a plagiarized result. I was curious if the AI could basically take data from a page I was looking at and swiftly put it in a useful order to answer my question. That it immediately failed to do that well did not really surprise me. But, since I like to share as much of my reporting experiences with Game File readers as possible, my instinct was not to keep that experience to myself but to write about it. (Frankly, it's not a new idea to point out that AI hallucinates, so I plead guilty to writing a quick post that was a bit of a cliche.)

I have hundreds of stories I want to write for Game File readers. Not one of them is one I have any desire to ask a chatbot to help me with.

Full disclosure: I have used ChatGPT once before. It was last spring, when I received multiple suspiciously-worded freelance pitches and was trying to figure out their origins. I wondered if they were created by AI, so I tried ChatGPT to see if I could get it to replicate the pitches.

Otherwise, I have not actively used generative AI for my reporting and don't plan to. Reporting about AI, as with reporting about anything else, is one thing. But using it? Nah.

I think, technically, some of the transcription and translation tools I use for some articles use AI or machine learning to help produce English language transcripts. But I check any translations with human sources. And I listen back to every transcribed interview and quote to make sure that they are accurate.

My experience with AI is impacted by watching it repeatedly steal work, generate falsehoods and produce bad art. Things like Grammarly recently casting me and my peers, without my permission, as AI-driven "experts" hasn't helped, either.

My views are also impacted by the experiences I've seen people in gaming have with it, including the game studio that had to deal with Google AI providing false tips about its games: https://www.gamefile.news/p/google-ai-overview-false-tips-trash-goblin

Don's avatar

Thanks for the reply!

I saw the bit there, and when I shared this with some friends it was with a comment also that this is the type of thing that AI evangelists say AI should be good at. This is in theory at least defined info source and counting and the way it failed at very basic tasks so badly was even to someone who is as skeptical of AI as me surprising. While 'AI hallucinates' is a bit cliche, this illustrated it in a very clear way!

Grammarly has been crazy to watch over the last couple of years as it went from a useful tool, to being AI driven and then this whole new 'ai editor expert' thing that is just insane.

Joel Bartley's avatar

I find that you can't ask LLMs about specific web pages unless you've printed that page to PDF and sent it as part of the query. Otherwise it will either not be able to access the page at all or will possibly provide you dated info or info it is "inferring" from old knowledge.

Clayton Grey's avatar

I came as I was curious which model you used; it looks like you've noted the free version. Unless you're doing something very subjective and conversational, the free model doesn't do very well. I can appreciate not wanting to pay them, but the extended thinking mode on a paid account is notable better than the free model. It's fine to be down on it, that is a really bad result. I would however, consider this an important detail. I have a lot of negative opinions of both the tech, people's relationship to it, and most especially the companies themselves, but as a tool these things are not all the same, and the way they do the work has a substantial bearing in the results.

Dominik Bošnjak's avatar

Individually, everything you asked for is quite simple but together, it's the kind of task that'll likely keep tripping up LLMs for a while yet.

The way to solve it programmatically may be with specialized models, while using one model to deploy and orchestrate these "agents" -- all way more work than something like this is worth it probably 😅

ZR's avatar

I am not pro-AI in any way (I think it’s awful) but your methodology is slightly flawed. If you’re going to use a chatbot to do this, you need to instruct it to do something with function calling. Something like “use beautifulsoup to build an array of all the titles inside the doc with this ID” and then “use python to sort the array by string length.” Chatbots can’t count, but they can use tools that can.

Gavin McFarland's avatar

The problem is that your typical user isn't going to do this, or even know they should. These LLMs are great at producing results that *look* authoritative, and unless they already have knowledge (like Stephen in this example) or have another reason to double-check, they're likely to take the answer and run with it. It's not just that LLMs get things wrong - goodness knows people do, too - it gets them wrong so confidently.

Dominik Bošnjak's avatar

Spot on, you'll lose the average user at somewhere between "wtf is a string" to "beautiful soup? I'm not hungry"

Juliano Zucareli [ozuka music]'s avatar

Not that much "intelligence" in it, it seems lol

John Murphy's avatar

Trust but verify. I was doing a similar thing with the number of Metroids in Metroid II. It had been a while since I played the game and I couldn't remember how many Metroids Samus had to defeated. AI said 47, but something seemed wrong about the number. I checked a playthrough video and the answer is actually 39. So close AI but no cigar

Ryan K. Rigney's avatar

Haven’t used ChatGPT in a while but curious about the settings/model you had picked. Was it using DeepResearch mode/extended thinking etc

Stephen Totilo's avatar

Just the default free version.

Kyle Kukshtel's avatar

Doing it "correctly" instead of throwing into whatever free model ChatGPT runs wouldn't lead to anti-ai bait posts though...

Juliano Zucareli [ozuka music]'s avatar

Jeez, how refreshing (and amusing) to read a (human) writer's take on the issue... Thanks for putting this together--and sorry for your trouble lol

Montine Cliffson's avatar

The world is quietly sorting itself into two echo chambers.

On the one hand: "I tried giving an actual serious task to free tier ChatGPT with no reasoning and it failed. lol, lmao even"

On the other hand: "I am continuously having Claude Code with Opus 4.6 directly edit my pacemaker's firmware and my migraines have improved by 20%, this will clearly end well"

Lots to dunk on at both camps, but seriously, you haven't used what the industry means today by the term AI. You used a toy Sam Altman made to sell ads to the masses, and you gave it a task almost *designed* for it to fail miserably.

I'm willing to gift you 3 months of Claude Pro so that you can experiment and see the difference for yourself. Message me if interested.

Hayley's avatar

I'm highly critical of people using chatbots for basically anything, but this article was shockingly thin.

Using chatbots for recent news is a known blindspot as these models are trained on necessarily old data and can only query specific new info if they know to, such as a new game's title. Additionally, we understand very well that chatbots cannot count letters in a word because they receive *tokens*, not letters.

Anyway, I obviously am not interested in reading an article by ChatGPT, nor do I think you're interested in writing anything with ChatGPT, but this was a very flimsy article and would do nothing to dissuade anyone bought into the hype. I think this would've greatly benefited from even a short discussion with an expert

steven owens's avatar

Are you saying that you fucked up on something and so you decided to make an article on Substack to try and get views from people because you're trying to generate some sort of money somewhere from someone?