Astro - Hacker News

32 comments

MaxGabriel an hour ago

This might stem from the domain I work in (banking), but I have the opposite take. Soft delete pros to me:
* It's obvious from the schema: If there's a `deleted_at` column, I know how to query the table correctly (vs thinking rows aren't DELETEd, or knowing where to look in another table)
* One way to do things: Analytics queries, admin pages, it all can look at the same set of data, vs having separate handling for historical data.
* DELETEs are likely fairly rare by volume for many use cases
* I haven't found soft-deleted rows to be a big performance issue. Intuitively this should be true, since queries should be O log(N)
* Undoing is really easy, because all the relationships stay in place, vs data already being moved elsewhere (In practice, I haven't found much need for this kind of undo).
In most cases, I've really enjoyed going even further and making rows fully immutable, using a new row to handle updates. This makes it really easy to reference historical data.
If I was doing the logging approach described in the article, I'd use database triggers that keep a copy of every INSERT/UPDATE/DELETEd row in a duplicate table. This way it all stays in the same database—easy to query and replicate elsewhere.
[-]
- nine_k an hour ago
  
  > DELETEs are likely fairly rare by volume for many use cases
  All your other points make sense, given this assumption.
  I've seen tables where 50%-70% were soft-deleted, and it did affect the performance noticeably.
  > Undoing is really easy
  Depends on whether undoing even happens, and whether the act of deletion and undeletion require audit records anyway.
  In short, there are cases when soft-deletion works well, and is a good approach. In other cases it does not, and is not. Analysis is needed before adopting it.
tracker1 28 minutes ago

I like having archive/history tables. I often do similar with job queues when persisting to a database, in this way the pending table can stay small and avoid full scans to skip the need for deleted records...
Aside, another idea that I've kicked forward for event driven databases is to just use a database like sqlite and copy/wipe the whole thing as necessary after an event or the work that's related to that database. For example, all validation/chain of custody info for ballot signatures... there's not much point in having it all online or active, or even mixed in with other ballot initiatives and the schema can change with the app as needed for new events. Just copy that file, and you have that archive. Compress the file even and just have it hard archived and backed up if needed.
rorylaitila an hour ago

Databases store facts. Creating a record = new fact. "Deleting" a record = new fact. But destroying rows from tables = disappeared fact. That is not great for most cases. In rare cases the volume of records may be a technical hurdle; in which case, move facts to another database. The times I've wanted to destroy large volume of facts is approximately zero.
talesmm14 an hour ago

I've worked at companies where soft delete was implemented everywhere, even in irrelevant internal systems... I think it's a cultural thing! I still remember a college professor scolding me on an extension project because I hadn't implemented soft delete... in his words, "In the business world, data is never deleted!!"
[-]
- mrkeen 18 minutes ago
  
  No comment from the professor on modifications though?
jamilbk 33 minutes ago

At Firezone we started with soft-deletes thinking it might be useful for an audit / compliance log and quickly ran into each of the problems described in this article. The real issue for us was migrations - having to maintain structure of deleted data alongside live data just didn't make sense, and undermined the point of an immutable audit trail.
We've switched to CDC using Postgres which emits into another (non-replicated) write-optimized table. The replication connection maintains a 'subject' variable to provide audit context for each INSERT/UPDATE/DELETE. So far, CDC has worked very well for us in this manner (Elixir / Postgrex).
I do think soft-deletes have their place in this world, maybe for user-facing "restore deleted" features. I don't think compliance or audit trails are the right place for them however.
3rodents 12 minutes ago

Soft deletes are an example of where engineers unintentionally lead product instead of product leading engineering. Soft delete isn’t language used by users so it should not be used by engineers when making product facing decisions.
“Delete” “archive” “hide” are the type of actions a user typically wants, each with their own semantics specific to the product. A flag on the row, a separate table, deleting a row, these are all implementation options that should be led by the product.
[-]
- monkpit 4 minutes ago
  
  Why would implementation details be led by product? “Undo” is an action that the user may want, which would be led by product. Not the implementation in the db.
clickety_clack 6 minutes ago

We have soft delete, with hard delete running on deletions over 45 days old. Sometimes people delete things by accident and this is the only way to practically recover that.
LorenPechtel 10 minutes ago

The % of records that are deleted is a huge factor.
You keep 99%, soft delete 1%, use some sort of deleted flag. While I have not tried it whalesalad's suggestion of a view sounds excellent. You delete 99%, keep 1%, move it!
maxchehab an hour ago

How do you handle schema drift?
The data archive serialized the schema of the deleted object representative the schema in that point in time.
But fast-forward some schema changes, now your system has to migrate the archived objects to the current schema?
[-]
- buchanae an hour ago
  
  In my experience, archived objects are almost never accessed, and if they are, it's within a few hours or days of deletion, which leaves a fairly small chance that schema changes will have a significant impact on restoring any archived object. If you pair that with "best-effort" tooling that restores objects by calling standard "create" APIs, perhaps it's fairly safe to _not_ deal with schema changes.
  Of course, as always, it depends on the system and how the archive is used. That's just my experience. I can imagine that if there are more tools or features built around the archive, the situation might be different.
  I think maintaining schema changes and migrations on archived objects can be tricky in its own ways, even kept in the live tables with an 'archived_at' column, especially when objects span multiple tables with relationships. I've worked on migrations where really old archived objects just didn't make sense anymore in the new data model, and figuring out a safe migration became a difficult, error-prone project.
theLiminator an hour ago

Privacy regulations make soft delete unviable in many of the cases where it's useful.
[-]
- wavemode 23 minutes ago
  
  Soft deletion and privacy deletion serve different purposes.
  If you leave a comment on a forum, and then delete it, it may be marked as soft-deleted so that it doesn't appear publicly in the thread anymore, but admins can still read what you wrote for moderation/auditing purposes.
  On the other hand, if you send a privacy deletion request to the forum, they would be required to actually fully delete or anonymize your data, so even admins can no longer tie comments that you wrote back to you.
  Most social media sites probably have to implement both of these processes/systems.
  [-]
  - SchemaLoad a minute ago
    
    Imo there should be some retention period for moderation but then hard deletion after that. Why would a moderator need to look up a deleted post a year after it was deleted?
- sedatk an hour ago
  
  The opposite is true in countries where there are data retention laws. Soft-delete is mandatory in those cases.
nemothekid 2 hours ago

The trigger architecture is actually quite interesting, especially because cleanup is relatively cheap. As far as compliance goes, it's also simply to declare that "after 45 days, deletions are permanent" as a catch all, and then you get to keep restores. For example, I think (IANAL), the CCPA gives you a 45 day buffer for right to erasure requests.
Now instead of chasing down different systems and backups, you can simply set ensure your archival process runs regularly and you should be good.
cj 2 hours ago

We deal with soft delete in a Mongo app with hundreds of millions of records by simply moving the objects to a separate collection (table) separate from the “not deleted” data.
This works well especially in cases where you don’t want to waste CPU/memory scanning soft deleted records every time you do a lookup.
And avoids situations where app/backend logic forgets to apply the “deleted: false” filter.
[-]
- vjvjvjvjghv 2 hours ago
  
  I guess that works well with NoSQL. In a relational database it gets harder to move record out if they have relationships with other tables.
  [-]
  - tempest_ an hour ago
    
    Eh you could implement this pretty simply with postgres table partitions
    
    [-]
    
    buchanae an hour ago
    
    Ah, that's an interesting idea! I had never considered using partitions. I might write a followup post with these new ideas.
    
    [-]
    
    tempest_ an hour ago
    
    There are a bunch of caveats around primary keys and uniqueness but I suspect it could be made to work depending on your data model.
ntonozzi an hour ago

I've given up on soft delete -- the nail in the coffin for me was my customers' legal requirements that data is fully deleted, not archived. It never worked that well anyways. I never had a successful restore from a large set of soft-deleted rows.
[-]
- zahlman 44 minutes ago
  
  > customers' legal requirements that data is fully deleted
  Strange. I've only ever heard of legal requirements preventing deletion of things you'd expect could be fully deleted (in case they're needed as evidence at trial or something).
  [-]
  - jandrewrogers 17 minutes ago
    
    While not common, regulations requiring a hard delete do exist in some fields even in the US. The ones I familiar with are effectively "anti-retention" laws that mandate data must be removed from the system after some specified period of time e.g. all data in the system is deleted no more than 90 days after insertion. This allows compliance to be automated.
    The data subject to the regulation had a high potential for abuse. Automated anti-retention limits the risk and potential damage.
  - ntonozzi 39 minutes ago
    
    Many privacy regulations enforce full deletion of data, including GDPR: https://gdpr-info.eu/.
whalesalad an hour ago

A good solution here (can be) to utilize a view. The underlying table has soft-delete field and the view will hide rows that have been soft deleted. Then the application doesn't need to worry about this concern all over the place.
[-]
- elyobo an hour ago
  
  postgres with rls to hide soft deleted records means that most of the app code doesn't need to know or care about them, still issues reads, writes, deletes to the same source table and as far as the app knows its working
pjs_ 28 minutes ago

Tried implementing this crap once. Never again
nerdponx an hour ago

One thing that often gets forgotten in the discussions about whether to soft delete and how to do it is: what about analysis of your data? Even if you don't have a data science team, or even a dedicated business analyst, there's a good chance that somebody at some point will want to analyze something in the data. And there's a good chance that the analysis will either be explicitly "intertemporal" in that it looks at and compares data from various points in time, or implicitly in that the data spans a long time range and you need to know the states of various entities "as of" a particular time in history. If you didn't keep snapshots and you don't have soft edits/deletes you're kinda SoL. Don't forget the data people down the line... which might include you, trying to make a product decision or diagnose a slippery production bug.
cyberax 16 minutes ago

Soft deletes + GC for the win!
We have an offline-first infrastructure that replicates the state to possibly offline clients. Hard deletes were causing a lot of fun issues with conflicts, where a client could "resurrect" a deleted object. Or deletion might succeed locally but fail later because somebody added a dependent object. There are ways around that, of course, but why bother?
Soft deletes can be handled just like any regular update. Then we just periodically run a garbage collector to hard-delete objects after some time.