You Are Not Bullish Enough On AI

May 30, 2025

Intro:

If the title sounds like hyperbole, I assure you it is if anything, understated.

Yesterday I pushed a blog post covering Carvana from 5/12/2025 - 5/18/2025. You can find a link here.

Not a single piece of text in that report was written by me! Absolutely 0% of the content was human generated, 100% Gemini.

That may be obvious for parts of it that reference shitty Zacks articles, but that report is the worst the AI will be, which is already better at extracting good info than I was when I started my job! My goal is to automate such coverage, continue to improve it with more data sources, and expand coverage to additional stocks.

My end vision here is basically having automated weekly reports covering basically every “check box” type of weekly information you’d want for any stock you can think of. SEC filings, news articles, job postings, upcoming catalysts and market consensus on those, changes in sell-side estimates, the CEO liking a partners LinkedIn post ahead of an announcement, etc. You want alt data coverage? Let’s make it cover alt data.

Cherry on top, I want to let you chat with it too! Essentially a fully automated junior analyst (minus the modeling capabilities, but Excel plug-ins can happen down the line). That’s part of the goal with

Clarity Markets

which is where I’ll likely be posting articles in the future related to this type of coverage. It will also naturally be a work in progress as I expand here. I need to built out the data collection side, scale up the automated coverage side, wrangle the robot overlord to my will, etc.

It’s very likely some of this ends up being paywalled. If our goal is to basically create a junior analyst that can cover hundreds of stocks, I hope a tiny fraction of that value is a worthy payment. The goal will be to make it feel like a total steal.

The whole journey is hopefully quite informative to you, the reader. It most definitely has been very informative for me, the person messing with models 8-12 hours a day. As it so happens, if you run in to Gemini and ask it to give you a report on how many cars CVNA sold last week, it will produce total junk, and the formatting won’t be consistent, and numbers may be fake.

So below we’ll explore some of my model learnings, and feel free to subscribe here or on the Clarity blog for future updates. Cheers.

The Basics Of Models:

The key to getting good model output is understanding how models work

Models can be broken down into a few key parts, roughly as so

Model capability
Data sources
Prompt

Below we are going to take a look at how each of these work, and how you can interact with them to produce viable output.

Model Capability:

This is the variable you cannot control (besides picking different models in ChatGPT/Gemini). Roughly speaking, model capability has improved to an absurd degree. Probably the most visceral example is of Will Smith eating spaghetti, comparing the viral early examples from 2023 with more current capabilities, that as of now are not even state of the art anymore!

Reddit users have started making full music videos of surprising quality. Some short clips are starting to look basically indistinguishable from reality. It’s phenomenal really, I heavily encourage keeping an eye on it.

Benchmarking text output of course is less intuitive, but I think we’re safely at the point where the logic skills of AI are better than almost all fresh college graduates. Google has made models with math skills on par with the top 0.01% already, so like, good luck to us humans I guess?

Of course logic isn’t everything. Some fields like programming/math tend to have objective truths, lots of publicly available training data, not a lot of missing context. There isn’t some big “Encyclopedia of High Level Math” locked in the vault of American Math Inc. or something. Whereas say with sales, companies like Salesforce basically operate as big repositories of somewhat proprietary lead sources. Everyone isn’t nicely uploading their work phone, email, SaaS requirements, etc on to a clean searchable profile. It’s a bit messy.

As a result it doesn’t really matter how good the model is. If you were born to sell B2B SaaS ask it to find 100 sales leads it will likely hallucinate quite heavily. If you ask it model CVNA’s future earnings, it will either do a linear regression on past earnings, quote a Zack’s article, or just make something up. The amount of soft inputs that go into something like that is extremely complex, so logic doesn’t quite matter. It needs data sources, it needs context on the data sources, it needs to be directed.

So let’s explore those.

Data Sources:

As you will notice, default Gemini will as of now, never give you good info on how many cars Carvana sold last week. Models aren’t great at making 10 leaps of logic, spinning up some code, spinning up a db, running code, doing math, and spitting you out an answer in 30 seconds. Some stuff still requires humans to go gather because it’s not direct, it’s messy, it has nuance.

To avoid the model hallucinating here, we can simply input the data ourselves! That’s what we’re doing now with Clarity, we are taking our websites data sources and feeding them into a model to produce commentary. Like I said, models are extremely good at logic. It’s quite able to take well labeled info from a database or spreadsheet and spit out a version in human readable text. It took a bit of wrangling variable names and such for consistency, but I was able to get the model to pull the data quite consistently.

Going forward, my goal is to hook up more and more data sources. Some of these may be job postings for example, or prediction market odds for macro event catalysts. In the future once we get enough money, I’d like to hook up credit card data, FactSet/Bloomberg, options chains, factor data, LinkedIn posts, social media statistics, expert networks, etc.

Say you were interested in how Wayfair was handling tariffs. It’d be pretty cool to see something like:

Week of 6/1/2025:

Credit Card Data Sales: +3% YoY vs +1% YoY for the week of 5/25/2025
Credit Card Data Sales QTD: -1% YoY
Twitter Followers: 111,111 (+113, 0.1% WoW) (+5,843, 0.5% YoY)
Instagram Followers: 2,193,129 (+10,000, 0.5% WoW) (+193,129, 10% YoY)
Analyst Consensus Q2 2025 EPS: $1.15 (+$0.05 WoW)
- New “Buy” note from JPM talking about blah blah blah
1 LinkedIn post by CEO Niraj Shaj discussing tariff impacts on home decor
- Link
5 Expert Network Calls
- Former Wayfair Logistics Manager
  - Discusses potential tariff impacts on Gross Margin: Link
- etc
SEC Filings:
- Form 4: Insider Buy by CFO Kate Gulliver for 10,000 shares at $25.00 for a total of $250,000
  - Total ownership: $1,000,000
- Form 4: Insider Buy by CEO Niraj Shah for 10,000 shares at $25.00 for a total of $250,000
  - Total ownership: $1,000,000
- 8K: Wayfair Announces New Factory Opening in India
  - Link

You get the point.

In theory something that like is totally possible, dare I say easily doable, when hooked up to the right data sources. It’s also effectively a large part of what an analyst may work on during a week, which can leave more time available for the work with a bit more flair. Another added benefit is that maybe I don’t have a great pipeline for Wayfair, I’m new to it. Instead of building out everything from scratch, I can have a repository of analysis and news and build from there.

Overall, very very bullish about the potential here for AI. That Wayfair example? With pre-collected data the actual report could be generated for fractions of a cent. Of course bundling Tegus/Bloomberg/Yipit is hundreds of thousands of dollars per year, or with raw data sources potentially millions, but the added cost for the output is basically a rounding error compared to a human. It’s absurd!

A primary concern of course here is hallucinations, you may still be doubting me on the efficacy. After all, you may see a viral tweet like the below and properly note the output is totally non-sensical, so why am I so excited?

Essentially, you should absolutely never do what is happening in that prompt. Models are still very bad at taking extremely complex asks, that don’t have pre-loaded data sources, and spitting out something good. There’s a gigantic difference between ripping a “what was mag7 capex last year” and a proper prompt.

Here’s Gemini providing a ton of totally wrong answers to that prompt: https://g.co/gemini/share/06eaa869784b

The reason here is data and the prompt. The way these models work is they basically follow a chain of reasoning using probabilities to turn your prompt into an output. The longer the chain of reasoning, or the wider the probabilities, will increasingly produce bad results. If the model has a 50% chance to get lost, and you ask it a 10 step problem, you’ll get a correct answer maybe once every thousand prompts! When interacting with a model, you want to reduce the probability of getting lost as much as possible, as well as reducing the opportunities to get lost.

The META CapEx above was wrong, so let’s try it again, but this time simply copy paste their Cash Flow Statement from Yahoo Finance

https://g.co/gemini/share/aac5ea638d8f

We get it right! I didn’t need a fancy solution or anything, just a raw copy pasted cash flow statement. Like I said, the model is very very good at logic, but it has no idea if “capex” means it should go find the actual SEC filing or just Bob’s Trade Blog as a source.

If we really wanted to chart CapEx over the last 5 years and next year, it would be trivial to do so by loading the prompts with the relevant filings, loading them will sell-side estimates, and basically just leaving the model to parse and graph. It’s really really good at well defined math!

The downside here being that it’s not sexy to go grab a bunch of filings and plug them in vs just having the model one shot it. Having to go find sources for your questions? Might as well do it yourself.

I agree, LLM’s are not quite good enough at contextual understanding to do random bespoke work. Thankfully, I’d say the majority of finance work is not bespoke! It’s parsing well defined news flows and data, and using that to do bespoke work! This makes your data sources and integration imperative to finding value from LLM’s. My goal is to basically read the markets mind for every stock, and aggregate data coverage for everything relevant. Hopefully as I do that, I can then use all of that work as a grounding “database” for bespoke stuff. Work in progress.

On to prompting.

Prompt:

There’s a lot that can be said about prompts, but we’ll focus on stuff primarily useful to the financial research use case.

The first thing to know is that prompts depend on the model. Deep research for example is very very good at finding info from a ton of different links. It uses an absurd amount of compute to aggregate an absurd amount of information. If you try to prompt “what color is the sky” it will look through dozens of links and casually write you an 18 page report. Naturally there are a ton of steps here, and as we discussed each step has variance, so getting a consistent output is very hard. If I run the same prompt 10 times, it may cite Nasa all 10 times, but what it cites, how it says it, how its formatted, all of that will change.

Thus if you walk into deep research and ask it to write a research note in a certain format, it will fail basically 100% of the time (believe me, I’ve tried).

What’s very helpful is to break up anything you’d like an LLM to do into parts. You can see this in the blog. There are numerous “sections” and each was generated differently.

For example, if I want analysis on how many vehicles Carvana sold last week, and I have a database with that information, I really don’t need Gemini to do a google search. I don’t need it to find 500 sources. I need it to run some basic logic on a provided dataset and spit out text. Thus I simply provided it with a format from prior work, and asked it to generate for that week. All in all, it cost about $0.0001.

Now if I want it to find and summarize general news? That requires a bit more discretion. In that case, having Gemini run through 500 sources and try to establish relevance to CVNA can make sense. The prompt likely needs to contain some idea of what you are looking for, some whitelisting or blacklisting of sources (the blog for example contains a bunch of Zack’s slop, oops), a time frame, etc. It’s also very helpful to load sample reports into the prompt as context for what kind information is useful. I hand wrote a report for a different week, which I fed into the model as an idea in terms of highlighting SEC filings, interest rates, press releases, general automotive industry news, etc.

The output of that report of course will be almost totally random in terms of format and to some extent content. What’s cool is that because this is vastly cheaper than a human going through hundreds of sources, we can do the neat trick of running it a few times! Given the non-deterministic nature of the models, this can also dig up new information that may have been missed on a prior run. For example, one run nicely flagged that CVNA’s ABS trusts release reports on the 15th of May. Combing it with another report led to the final blog having that information, whereas a singular run missed!

On the format side, again the models are great at logic. What we can do is simply provide a sample report to the model, and ask it to take the output files of Deep Research and condense into the format of the sample report! Likely will need some fine tuning here in terms of what makes the cut as again the model doesn’t know if Zack’s is better than the WSJ to include in news, but for a v0 it’s quite good. It did not take much effort to get the formatting side nailed down, and going forward each completed article can be fed in as context to create new articles, reinforcing the “quality”

So to summarize:

Understand the model you’re working with and it’s capabilities/tendencies
Feed the model examples
Attempt to tailor context and provide rules/weighting on info sources
Magic

Fin:

As always, hopefully this article was helpful. I’d like to expand coverage as much as possible while also ensuring the quality is something I’d actually be interested in. Not trying to have the model draw lines on a chart and pretend it’s the work of Einstein.

That said, irrespective of how well I can figure this out, I do think very large portions of the analyst job in terms of time spent will become redundant much faster than imagined. I’d rather curate a task to Gemini at this point than higher the majority of college grads. Additionally, these models are about 3 years old. It takes us humans about 22 years to be considered vaguely employable, probably another 3-5 to be value generating, and we aren’t always awake and available. We have families, hobbies, maybe we leave jobs. Give it an extra ~50 years on average and we stop existing altogether. Tough competition!

The utilization of mass scale intelligent compute is in it’s infancy and it’s already a multi-trillion dollar endeavor. We’re going to start seeing mass market AI music videos, AI cartoons, AI video games, AI tutors, AI teachers, AI analysts, AI AI consultants, AI influencers, AI porn, etc.

The vast majority of tasks on this earth are able to be defined. Software was already eating the world, AI is an even further step change.

So anyways, I bought some Google a few weeks ago. In a decade hopefully Gemini looks upon me favorably for that.

Cheers.

Catapult Capital

May 31

Very interesting, thanks for sharing. Inspired me to try out some more automation in my own work. One quibble though, I'm not sure what you shared here makes me more bullish for AI. Am I wrong or is the only new ability here really that of summarizing the sources into bullets. You are using older methods of computing to collect the data and put it in front of the LLM (even searching the web and finding relevant articles is something you could have done prior to LLMs, the LLMs are sitting on top of older tools). You are hand holding it through the structuring of the report and designing that structure yourself. You have to explicitly tell it what a good source and bad source is and what is irrelevant vs relevant. The only thing new is that it can convert each of those SEC filings into a bullet point. No doubt that it magic and useful and unlocks stuff like this but it is well appreciated at this point. To put it another way, you could have produced this report prior to LLMs except instead of a bullet point summary of each underlying report/source you would just have the link.

Expand full comment

Simple Value Investing

Great article as usual. Any opinion on ChatGPT vs Gemini? Google definitely has the firepower to stay in the race for quite a while, but the landscape is evolving too fast imo atleast

Indra's Thoughts

Discussion about this post