How portable are your agent definitions? If I build one for insurance documents, how much work is needed to adapt it to a completely different domain like legal contracts or healthcare?
In practice we find that each domain (and even each organisation) ends up having highly customized definitions.
At first, fairly generic templated definitions sort of work, but what we've seen is that over time data comes up that is out of distribution, and there was no explicit instruction on how to deal with it. In such cases we tend to flag this and offer suggestions to the users on how they can improve the specificity of agents.
Another structure we have seen play out is having a manager review ratings and feedback comments from their team and updating the definitions accordingly over time (where we offer them the capability to see results of before and after side -by-side for all existing data as well, so they are more confident in the change before committing).
The amount of work is dependent on how good the initial definitions are and how complex the use case is (and how much it evolves - new data sources etc). A bit of an unsatisfying answer but it can be anywhere between a few hours one off or a couple of minutes per day on an ongoing basis.
I worked recently on an internal tool to achieve this kind of things, mostly plugging mistral OCR to gemini to extract structured data from documents. We then perform automated diffs too.
There seems to be an insane amount of competition in the "Intelligent Document Processing" market, like for instance parseur, whose founder is often on HN himself.
What do you think sets you apart from competition like :
1) Mistral document AI : depending on the model, it looks way cheaper than yours, OCR model pricing ranges from 0.001 to 0.004 EUR / page and they have structured output wired in the OCR API if needed (things then get fed to one of their LLMs) + EU-based and GDPR ready
2) parseur / rossum / docsumo / nanonets (which is YC 2017) ?
I understand what they are trying to do, but to me it feels like the moment when MongoDB entered the database space, with semi-structured, "flexible" storage format. It has its uses, for prototyping mostly.
But in high-volume, production workloads, giving a structure to the data you extract (what Parseur does through defining the Fields in your Mailbox, basically giving your output data a schema) adds a ton of value, and the larger the dataset, the truer it is.
Usually, you start by defining where you want your data to go, and which structure it should have, before working backwards from here and starting to extract the data. This is the key to automating your document workflow.
1. We are working with the assumption that OCR is (or soon will be) solved at super low prices.
So if we have the extracted data, what can we do with it?
Where we see Parsewise making a difference is for use cases that span across documents.
I.e. if you are extracting the same 5 fields from every invoice, there are lots of solutions as you listed (+ reducto etc). However, once you have a set of documents (e.g. an entire mortgage application package) and you are trying to get a structured response out, then your option is either an LLM API (if things fit into context and you are okay with limited citations), or building a pipeline with LLMs. I posted it in another comment but an example of trawling through 90k pages is here: https://www.parsewise.ai/officeqa-sota
2. While we rely on LLMs, the outcomes will be non-deterministic, so the bottleneck is and will remain the human verification (that is for somewhat complex use cases). The architecture that we have built is optimizing for the human reviewer to provide as granular values and citations as possible. This is either through our platform, or API clients.
I say this with a lot of love: The vibecoded applications in your demo reek of AI slop design.
This isn't a critique of your product. It's just that the a beige-orange theme, the pill components, and the left-border highlight give me that visceral reaction as reading a paragraph littered with em dashes and "not X but Y." It makes me take you less seriously.
Haha no appreciate it! That's on me for not calling it out explicitly (was trying to make the video as short as possible), but the demo UIs were literally vibe coded to show the ease of integration https://youtu.be/F1cSuZal03s?si=1H4zTcO-8cosLbVr&t=70
I learnt a lot at Palantir, though always worked in commercial so no ties to security state (for the better or worse).
(Also side-note, we are working towards enabling frontier performance with smaller open models that allows our customers to protect their data. https://www.parsewise.ai/officeqa-sota )
And I do get genuine joy from helping our users, so love it is:)
Planning to serve good things for sure, and appreciate your note.
Ofc I didn't agree with everything Palantir was doing (also to the extent that we even knew about them at the time). I was working on vaccine distribution and cancer research as well, so definitely felt like helping.
How portable are your agent definitions? If I build one for insurance documents, how much work is needed to adapt it to a completely different domain like legal contracts or healthcare?
In practice we find that each domain (and even each organisation) ends up having highly customized definitions.
At first, fairly generic templated definitions sort of work, but what we've seen is that over time data comes up that is out of distribution, and there was no explicit instruction on how to deal with it. In such cases we tend to flag this and offer suggestions to the users on how they can improve the specificity of agents.
Another structure we have seen play out is having a manager review ratings and feedback comments from their team and updating the definitions accordingly over time (where we offer them the capability to see results of before and after side -by-side for all existing data as well, so they are more confident in the change before committing).
The amount of work is dependent on how good the initial definitions are and how complex the use case is (and how much it evolves - new data sources etc). A bit of an unsatisfying answer but it can be anywhere between a few hours one off or a couple of minutes per day on an ongoing basis.
I worked recently on an internal tool to achieve this kind of things, mostly plugging mistral OCR to gemini to extract structured data from documents. We then perform automated diffs too.
There seems to be an insane amount of competition in the "Intelligent Document Processing" market, like for instance parseur, whose founder is often on HN himself.
What do you think sets you apart from competition like : 1) Mistral document AI : depending on the model, it looks way cheaper than yours, OCR model pricing ranges from 0.001 to 0.004 EUR / page and they have structured output wired in the OCR API if needed (things then get fed to one of their LLMs) + EU-based and GDPR ready 2) parseur / rossum / docsumo / nanonets (which is YC 2017) ?
Hi, Parseur founder here :D
I understand what they are trying to do, but to me it feels like the moment when MongoDB entered the database space, with semi-structured, "flexible" storage format. It has its uses, for prototyping mostly.
But in high-volume, production workloads, giving a structure to the data you extract (what Parseur does through defining the Fields in your Mailbox, basically giving your output data a schema) adds a ton of value, and the larger the dataset, the truer it is.
Usually, you start by defining where you want your data to go, and which structure it should have, before working backwards from here and starting to extract the data. This is the key to automating your document workflow.
Great question!
1. We are working with the assumption that OCR is (or soon will be) solved at super low prices.
So if we have the extracted data, what can we do with it? Where we see Parsewise making a difference is for use cases that span across documents. I.e. if you are extracting the same 5 fields from every invoice, there are lots of solutions as you listed (+ reducto etc). However, once you have a set of documents (e.g. an entire mortgage application package) and you are trying to get a structured response out, then your option is either an LLM API (if things fit into context and you are okay with limited citations), or building a pipeline with LLMs. I posted it in another comment but an example of trawling through 90k pages is here: https://www.parsewise.ai/officeqa-sota
2. While we rely on LLMs, the outcomes will be non-deterministic, so the bottleneck is and will remain the human verification (that is for somewhat complex use cases). The architecture that we have built is optimizing for the human reviewer to provide as granular values and citations as possible. This is either through our platform, or API clients.
I say this with a lot of love: The vibecoded applications in your demo reek of AI slop design.
This isn't a critique of your product. It's just that the a beige-orange theme, the pill components, and the left-border highlight give me that visceral reaction as reading a paragraph littered with em dashes and "not X but Y." It makes me take you less seriously.
Cool demo otherwise.
Haha no appreciate it! That's on me for not calling it out explicitly (was trying to make the video as short as possible), but the demo UIs were literally vibe coded to show the ease of integration https://youtu.be/F1cSuZal03s?si=1H4zTcO-8cosLbVr&t=70
llamaparse also do it, what is different here?
Mostly cross-doc reasoning at scale (e.g., 90k-page corpora) as opposed to doc-to-markdown conversions.
Ah probably should add a link to our website: https://www.parsewise.ai/api
Added above :)
"retaining lineage"
"That is a great catch!"
[flagged]
A launch post is not a place to attack other users personally. Neither is any other HN thread for that matter, so please don't do it here.
https://news.ycombinator.com/newsguidelines.html
I learnt a lot at Palantir, though always worked in commercial so no ties to security state (for the better or worse). (Also side-note, we are working towards enabling frontier performance with smaller open models that allows our customers to protect their data. https://www.parsewise.ai/officeqa-sota )
And I do get genuine joy from helping our users, so love it is:)
[flagged]
A launch post is not a place to attack other users personally. Neither is any other HN thread for that matter, so please don't do it here.
https://news.ycombinator.com/newsguidelines.html
Planning to serve good things for sure, and appreciate your note. Ofc I didn't agree with everything Palantir was doing (also to the extent that we even knew about them at the time). I was working on vaccine distribution and cancer research as well, so definitely felt like helping.