I don't trust these AI-only companies to be overnight experts in properly handling medical, financial and insurance data. They have no business providing these tools, unless they want to take all the risk too.
the details are key here. there is plenty of automatable financial work, sure, but also when it comes to reporting finances/costs (formally or informally) and having a real human being be accountable for them, you REALLY need to trust that nothing is hallucinated.
Any idea how they ensure this doesnt happen? As in, how can a user verify that the model did not touch any of the numbers and that it only built pipelines for them.
what I've been telling my CFO who wants to get AI involved in things is that for a lot of accounting and finance work "Trust but verify" doesnt work because verify is often the same process as doing the work.
To be honest I am having a hard time remembering the last time a LLM hallucinated in our pipelines. Make mistakes, sure but not make things up. For a daily recon process this is a solved problem imo.
Reads different to me. Some examples to go run with and build your own. Covers cases from the investment side and then the obvious ones in an accounting perspective. It would be highly surprising that any of these would be use in production without modification. I am sure it will happen but the intent to me is to take this and run with your own process.
On the spend management side of things, I've found pretty remarkable success in letting LLMs check "does this receipt match this reimbursement request and based on all the information about the user, the request, and our policy, is it appropriately allocated to appropriate GL, Location, Department, and Project codes?" If the verification step fails, it kicks it back and the user can either override it (which gets it flagged for AP review), or fix it. It does substantially better than the naive Bayes classifier I was using before.
Why? It sounds exactly like the design I would hope for. It automates what I'm going to do already without needing to wait. And it allows you to bypass it entirely and just revert to the manual process (along with waiting).
In many businesses, the employee is responsible for inputting most of that. If a LLM can get to 95% accuracy and flag exceptions, the employees (and AP team) would actually have less work and bureaucracy.
Though we’ve had a few incidents where employees have submitted AI-generated receipts for reimbursement which is another issue..
What is your point? This is pretty normal expense management in any company setting. I don’t know what is so bad about being on the other side of that. Hope I am not too inflammatory by asking what is the point but genuinely you pointed it out like it’s some archaic process flow but it’s part of almost every expense system.
Yes. On the accounting side agents can handle a lot of the low value work like recons and other ledger activity pretty well. On the investment side I think like you pointed out it’s going to be a lot of research, industry, company, macro etc. Value in letting run on top of the data you have and put together ideas at a quicker pace than a human can. There is still a human in the loop but it can do a nice job of lining up thought you might have otherwise missed.
Seen it used in some of the fraud models (I work in insurance). So that's both from the perspective of people trying to claim fraudulently and from suppliers over charging. I can't say how much of a lift we actually get vs existing ML models
Will the big labs leave anything for external competition?
This probably killed a thousand startups in this space.
in the early internet you wouldn't see google creating their own news site or facebook building their own animal farm.
what happened to platformication of everything?
Building a startup on an LLM is like building a house on a foundation of quicksand. As the LLM gets better it naturally erodes your moat. It's a completely different dynamic compared to the internet. It's why I'm watching this from the sidelines.
I have a close friend who is trying to build a company entirely on top of Claude. He doesn't know how to program. He can't do basic arithmetic. Yet, the company he's building is a "Data Science AI for the Government" because, according to him, all of the data scientists at NOAA don't know what they're doing.
I have given up on trying to get through to him how bad of an idea this is. He's unemployed and has been working on this for over a year.
> Will the big labs leave anything for external competition?
Is this a serious question?
Without the big labs with deep pockets investing to change the consumer mindset do you think a small company with no funding has any chance of even existing?
I remember when paying $1.99 for a mobile game on iOS was considered too expensive and now it seem most consumers are primed to spend more on in-app purchases every week. That mind-shift did not happen overnight.
It was not that long ago $200 for ChatGPT subscription was considered extravagant but now even wrappers can charge this price without hesitation - some of them do.
What Anthropic is doing is priming the market of which they will be potentially one of the main beneficiaries as long as they can continue existing. But I don't think anyone will go to Anthropic directly to source their financial services agent. They will go to financial service companies that use Anthropic to build the capabilities.
> Will the big labs leave anything for external competition?
No, why would they if they have the choice?
> what happened to platformication of everything?
Business happened. The web works differently from how it used to. The users are different. LLM inference and AI tools is a different core product from search and ads. That, and we have the benefit of hindsight now. Maybe a Google newsroom would've actually been a good idea in 2006 in hindsight, who knows.
Also realistically you could say the same thing about Google Maps and Street View. That probably also killed some startups. Google isn't running a charity for startups.
This was their play all along with their unethical data collection practices: let others use the APIs to discover the applications, then use the data against them to offer integrated solutions in every vertical of interest. Cursor, once Anthropic’s biggest customer, was one of the early ones they screwed.
They are also fighting for their lives because these insane valuations simply aren’t justified by being dumb pipes. Fortunately, open weights models are widely available and have crossed a threshold of usefulness that cements their place as good substitutes.
I guess the argument is that a tool built by a company with actual insight into and focus for financial services, with Anthropic as inference provider, would lead to more adoption and more use of Anthropic models. Something Anthropic could achieve either by just leaving things alone and having the best models, or alternatively by starting some kind of incubator or something. AWS might be a good model
The issue with that is obviously that most of the generated value would be captured by that company in the middle, while Anthropic would stay in the cost-conscious inference market.
Why would anthropic at all prefer this approach when that middle man can switch and cost-arbitrage between countless other model providers.
We're not talking about what is best for the consumer (ex more competition to force iterations and improvements), but what Anthropic thinks is best for Anthropic.
Make up the lower margins by larger volume because you get much better market penetration. But you are right that this only works if you know the middle-men don't go to other model providers. That's where some kind of incubation program that provides capital or credits or whatever in return for long-term commitments might work
But I doubt staying a pure model provider is a winning move. It's a market nobody will win long-term. Almost all of the value to be captured isn't in inference APIs but in how to use them to generate business value. Claude Code was already the right approach, they "just" need to show they can repeat this for other kinds of tasks
> Almost all of the value to be captured isn't in inference APIs but in how to use them to generate business value.
If the business value can be generated with a few thousand words in a SKILL.md on top of a commoditized model it doesn't sound like that's a market anyone can win long-term either, and the business value is ultimately going to accrue elsewhere (the customer, the inference hardware provider, etc)
History suggests otherwise. railroads, telecoms, search all consolidated. The natural equilibrium for transformative infrastructure is winner take all. AGI/ASI won’t be different but will be nearly every vertical and governments will legislate too little too late.
local models are going to win and therefore the hardware providers, Apple and nvidia.
There isn't going to be any moat for the hosted providers besides hardware scale. They can run your request on shared 1TB memory hardware, or whatever.
But local hardware is going to catch up, the hosted providers are going to become commoditized, and the costs are just going to be compute whether its your hardware or theirs.
And your laptop is going to be powerful enough to be good enough for most cases.
Local hardware catching up doesn’t matter if the thing worth having never leaves the building. Enterprise services are hard, moat is in distribution and know how.
I'm not sure if this was tongue-in-cheek or not, but Yahoo created its own news site in 1996: https://en.wikipedia.org/wiki/Yahoo_News and FB had Zynga's Farmville as well.
I am not sure if people are using claude design, security review stuff and other tools they have built so far.
Building is the easy part. There are lot of service level stuff that I am sure anthropic will not be able to provide, therefore they are trying to partner with other orgs in that realm.
I am very skeptical about their stuff now.
If you are builder, I believe you should avoid anthropic, it can be default to monopolistic behavior, I am not saying they are doing it, but they could, where in they see what you are building, if you have traction, position a product in that realm. Just saying.
> Will the big labs leave anything for external competition?
Unfortunately no.
The TAM for Anthropic and OpenAI is anything that runs software or a screen.
Any software or technology business that has high margins that Anthropic and OpenAI are not doing will be a target.
After both their IPO's mandates Wall Street them to push for more growth by competing in other technology business areas or they will get punished in the markets.
Making the most convoluted and idiotic insurance process on earth and then delegating that process onto an AI that requires huge buzzing data centers.. Is there an option to respawn in the non-clown world universe? It was funny at first but it gets tiring eventually.
I don't trust these AI-only companies to be overnight experts in properly handling medical, financial and insurance data. They have no business providing these tools, unless they want to take all the risk too.
I would recommend you to not use these, if you are not willing to absorb the risk.
Luckily there is still a significant market for the services.
> We’re releasing ten ready-to-run agent templates for the most time-consuming work in financial services
The templates being: pitch builder, meeting preparer, earnings reviewer, model builder, market researcher, valuation reviewer, general ledger reconciler, month-end closer, statement auditor, KYC (Know Your Customer) screener.
Seems pretty scattershot. Reminds me of GPT Store.
I'll be honest, I thought the first few items on your list of time consuming work was sarcasm.
the details are key here. there is plenty of automatable financial work, sure, but also when it comes to reporting finances/costs (formally or informally) and having a real human being be accountable for them, you REALLY need to trust that nothing is hallucinated.
Any idea how they ensure this doesnt happen? As in, how can a user verify that the model did not touch any of the numbers and that it only built pipelines for them.
what I've been telling my CFO who wants to get AI involved in things is that for a lot of accounting and finance work "Trust but verify" doesnt work because verify is often the same process as doing the work.
To be honest I am having a hard time remembering the last time a LLM hallucinated in our pipelines. Make mistakes, sure but not make things up. For a daily recon process this is a solved problem imo.
> Any idea how they ensure this doesnt happen?
Build a deterministic query set and automate it for monthly or daily reporting reconcilliation.
Leave AI out of it.
Reads different to me. Some examples to go run with and build your own. Covers cases from the investment side and then the obvious ones in an accounting perspective. It would be highly surprising that any of these would be use in production without modification. I am sure it will happen but the intent to me is to take this and run with your own process.
I find all of these .md files released by the labs to be ai generated slop. The only exception being maybe the /simplify command
For those in the finance space, are you actually seeing any real AI tools being used? Like for actual operational tasks?
I've really only seen it used for research / exploration thus far. Either for economic research slide deck or for exploring trading hypothesis
On the spend management side of things, I've found pretty remarkable success in letting LLMs check "does this receipt match this reimbursement request and based on all the information about the user, the request, and our policy, is it appropriately allocated to appropriate GL, Location, Department, and Project codes?" If the verification step fails, it kicks it back and the user can either override it (which gets it flagged for AP review), or fix it. It does substantially better than the naive Bayes classifier I was using before.
I’m not saying your implementation is bad or anything but my visceral reaction to this was “I’m glad I’m not on the other side of that”
Why? It sounds exactly like the design I would hope for. It automates what I'm going to do already without needing to wait. And it allows you to bypass it entirely and just revert to the manual process (along with waiting).
In many businesses, the employee is responsible for inputting most of that. If a LLM can get to 95% accuracy and flag exceptions, the employees (and AP team) would actually have less work and bureaucracy.
Though we’ve had a few incidents where employees have submitted AI-generated receipts for reimbursement which is another issue..
What is your point? This is pretty normal expense management in any company setting. I don’t know what is so bad about being on the other side of that. Hope I am not too inflammatory by asking what is the point but genuinely you pointed it out like it’s some archaic process flow but it’s part of almost every expense system.
Yes. On the accounting side agents can handle a lot of the low value work like recons and other ledger activity pretty well. On the investment side I think like you pointed out it’s going to be a lot of research, industry, company, macro etc. Value in letting run on top of the data you have and put together ideas at a quicker pace than a human can. There is still a human in the loop but it can do a nice job of lining up thought you might have otherwise missed.
Seen it used in some of the fraud models (I work in insurance). So that's both from the perspective of people trying to claim fraudulently and from suppliers over charging. I can't say how much of a lift we actually get vs existing ML models
Nope If anything firms are pulling back (I know someone closely who works at blackrock).
I don’t just know someone who works in finance, I am someone who works in finance and I say you’re wrong.
pulling back as in setting more realistic token budgets, or something more drastic? I'm curious
Stopped using them altogether in the context of productivity - in essence they’re useless.
I can believe that. Gambler’s Ruin gets costly when you’ve actually got money on the line.
Will the big labs leave anything for external competition?
This probably killed a thousand startups in this space.
in the early internet you wouldn't see google creating their own news site or facebook building their own animal farm. what happened to platformication of everything?
Building a startup on an LLM is like building a house on a foundation of quicksand. As the LLM gets better it naturally erodes your moat. It's a completely different dynamic compared to the internet. It's why I'm watching this from the sidelines.
I have a close friend who is trying to build a company entirely on top of Claude. He doesn't know how to program. He can't do basic arithmetic. Yet, the company he's building is a "Data Science AI for the Government" because, according to him, all of the data scientists at NOAA don't know what they're doing.
I have given up on trying to get through to him how bad of an idea this is. He's unemployed and has been working on this for over a year.
Building a business on top of any SaaS platform is building on quicksand. I know that from experience.
> Will the big labs leave anything for external competition?
Is this a serious question?
Without the big labs with deep pockets investing to change the consumer mindset do you think a small company with no funding has any chance of even existing?
I remember when paying $1.99 for a mobile game on iOS was considered too expensive and now it seem most consumers are primed to spend more on in-app purchases every week. That mind-shift did not happen overnight.
It was not that long ago $200 for ChatGPT subscription was considered extravagant but now even wrappers can charge this price without hesitation - some of them do.
What Anthropic is doing is priming the market of which they will be potentially one of the main beneficiaries as long as they can continue existing. But I don't think anyone will go to Anthropic directly to source their financial services agent. They will go to financial service companies that use Anthropic to build the capabilities.
> Will the big labs leave anything for external competition?
No, why would they if they have the choice?
> what happened to platformication of everything?
Business happened. The web works differently from how it used to. The users are different. LLM inference and AI tools is a different core product from search and ads. That, and we have the benefit of hindsight now. Maybe a Google newsroom would've actually been a good idea in 2006 in hindsight, who knows.
Also realistically you could say the same thing about Google Maps and Street View. That probably also killed some startups. Google isn't running a charity for startups.
This was their play all along with their unethical data collection practices: let others use the APIs to discover the applications, then use the data against them to offer integrated solutions in every vertical of interest. Cursor, once Anthropic’s biggest customer, was one of the early ones they screwed.
They are also fighting for their lives because these insane valuations simply aren’t justified by being dumb pipes. Fortunately, open weights models are widely available and have crossed a threshold of usefulness that cements their place as good substitutes.
Amazon Basics for Knowledge Work™
I guess the argument is that a tool built by a company with actual insight into and focus for financial services, with Anthropic as inference provider, would lead to more adoption and more use of Anthropic models. Something Anthropic could achieve either by just leaving things alone and having the best models, or alternatively by starting some kind of incubator or something. AWS might be a good model
The issue with that is obviously that most of the generated value would be captured by that company in the middle, while Anthropic would stay in the cost-conscious inference market.
Why would anthropic at all prefer this approach when that middle man can switch and cost-arbitrage between countless other model providers.
We're not talking about what is best for the consumer (ex more competition to force iterations and improvements), but what Anthropic thinks is best for Anthropic.
Make up the lower margins by larger volume because you get much better market penetration. But you are right that this only works if you know the middle-men don't go to other model providers. That's where some kind of incubation program that provides capital or credits or whatever in return for long-term commitments might work
But I doubt staying a pure model provider is a winning move. It's a market nobody will win long-term. Almost all of the value to be captured isn't in inference APIs but in how to use them to generate business value. Claude Code was already the right approach, they "just" need to show they can repeat this for other kinds of tasks
> Almost all of the value to be captured isn't in inference APIs but in how to use them to generate business value.
If the business value can be generated with a few thousand words in a SKILL.md on top of a commoditized model it doesn't sound like that's a market anyone can win long-term either, and the business value is ultimately going to accrue elsewhere (the customer, the inference hardware provider, etc)
I'm confused because I remember using Google News in 2006?
there has been a product called Google News since 2002. It was only aggregating information from news channels
History suggests otherwise. railroads, telecoms, search all consolidated. The natural equilibrium for transformative infrastructure is winner take all. AGI/ASI won’t be different but will be nearly every vertical and governments will legislate too little too late.
local models are going to win and therefore the hardware providers, Apple and nvidia.
There isn't going to be any moat for the hosted providers besides hardware scale. They can run your request on shared 1TB memory hardware, or whatever.
But local hardware is going to catch up, the hosted providers are going to become commoditized, and the costs are just going to be compute whether its your hardware or theirs.
And your laptop is going to be powerful enough to be good enough for most cases.
Local hardware catching up doesn’t matter if the thing worth having never leaves the building. Enterprise services are hard, moat is in distribution and know how.
I'm not sure if this was tongue-in-cheek or not, but Yahoo created its own news site in 1996: https://en.wikipedia.org/wiki/Yahoo_News and FB had Zynga's Farmville as well.
> in the early internet you wouldn't see google creating their own news site
Google News was definitely a thing (and actually still exists).
just looked up, it is still a thing - learn something new everyday!
But Google did move into a lot of spaces: maps, mail, docs, etc.
This is premature caution/fear.
I am not sure if people are using claude design, security review stuff and other tools they have built so far.
Building is the easy part. There are lot of service level stuff that I am sure anthropic will not be able to provide, therefore they are trying to partner with other orgs in that realm.
I am very skeptical about their stuff now.
If you are builder, I believe you should avoid anthropic, it can be default to monopolistic behavior, I am not saying they are doing it, but they could, where in they see what you are building, if you have traction, position a product in that realm. Just saying.
> Will the big labs leave anything for external competition?
Unfortunately no.
The TAM for Anthropic and OpenAI is anything that runs software or a screen.
Any software or technology business that has high margins that Anthropic and OpenAI are not doing will be a target.
After both their IPO's mandates Wall Street them to push for more growth by competing in other technology business areas or they will get punished in the markets.
It is ROI or bust.
You’re advocating for less competition? AI startup valuations are out of control. People are raising $20m seed rounds.
If you can’t prove PMF and differentiation with $10m, I’m sorry but you’re not a serious enterprise.
And if what you’re building is “pitch deck AI”, I mean, come on.
> tfw you've been huffing your own copium so much that you forgot you're selling shovels
lol these agents are missing the point re. What people actually do in these jobs.
This is an attempt to inflate token generation to fool people into increasing anthropic’s valuation.
Next couple weeks - financial and insurance services announce layoffs!
patagonia is gonna to lose some clientele
Everything is going to be slop and you're going like it.
Is the plan to have an LLM do everything? And do it worse?
"Oh yeah my Claude didn't agree with the pitch from their Claude"
The goal of current tech is to make humanity a gerbil running on a Claude wheel
At that point what even is the point of doing anything at all? Like, it’s less than useless.
AI and finance --- what could possibly go wrong?
Better Call Saul when (not if) it does.
Making the most convoluted and idiotic insurance process on earth and then delegating that process onto an AI that requires huge buzzing data centers.. Is there an option to respawn in the non-clown world universe? It was funny at first but it gets tiring eventually.