I recently started digging into databases for the first time since college, and from a novice's perspective, postgres is absolutely magical. You can throw in 10M+ rows across twenty columns, spread over five tables, add some indices, and get sub-100ms queries for virtually anything you want. If something doesn't work, you just ask it for an analysis and immediately know what index to add or how to fix your query. It blows my mind. Modern databases are miracles.
I don't mean this as a knock on you, but your comment is a bit funny to me because it has very little to do with "modern" databases.
What you're describing would probably have been equally possible with Postgres from 20 years ago, running on an average desktop PC from 20 years ago. (Or maybe even with SQLite from 20 years ago, for that matter.)
Don't get me wrong, Postgres has gotten a lot better since 2006. But most of the improvements have been in terms of more advanced query functionality, or optimizations for those advanced queries, or administration/operational features (e.g. replication, backups, security).
The article actually points out a number of things only added after 2006, such as full-text search, JSONB, etc. Twenty years ago your full-text search option is just LIKE '%keyword%'. And it would be both slower than less effective than real full-text search. It clearly wasn’t “sub-100ms queries for virtually anything you want” like GP said.
And 20 years ago people were making the exact same kinds of comments and everyone had the same reaction: yeah, MySQL has been putting numbers up like that for a decade.
I am a DBA for Oracle databases, and XE can be used for free. It has the reference SQL/PSM implementation in PL/SQL. I know how to set up a physical standby, and otherwise I know how to run it.
That being said, Oracle Database SE2 is $17,500 per core pair on x86, and Enterprise is $47,500 per core pair. XE has hard limits on size and limits on active CPUs. XE also does not get patches; if there is a critical vulnerability, it might be years before an upgrade is released.
Nobody would deploy Oracle Database for new systems. You only use this for sunk costs.
Postgres itself has a manual that is 1,500 pages. There is a LOT to learn to run it well, comparable to Oracle.
For simple things, SQLite is fine. I use it as my secrecy manager.
Postgres requires a lot of reading to do the fancy things.
Postgres has a large manual not because it's overly complex to do simple things, but because it is one of the best documented and most well-written tools around, period. Every time I've had occasion to browse the manual in the last 20 years it's impressed me.
I read Jason Couchman's book for Oracle 8i certification, and passed the five exams.
They left much out, so many important things that I learned later, as I saw harmful things happening.
The very biggest thing is "nologging," the ability to commit certain transactions that are omitted from the recovery archived logs.
"You are destroying my standby database! Kyte is explicit that 'nologging' must never be used without the cooperation of the DBA! Why are you destroying the standby?"
It was SSIS, and they could never get it under control. ALTER SYSTEM FORCE LOGGING undid their ignorant presumption.
My perspective might be equally naive as I've rarely had contact with databases in my professional life, but 100ms sounds like an absolutely mental timeframe (in a bad way)
I’m a huge Postgres fan. That said, I don’t agree with the blanket advice of “just use Postgres.” That stance often comes from folks who haven’t been exposed enough to (newer) purpose-built technologies and the tremendous value they can create
The argument, as in this blog, is that a single Postgres stack is simpler and reduces complexity. What’s often overlooked is the CAPEX and OPEX required to make Postgres work well for workloads it wasn’t designed for, at even reasonable scale. At Citus Data, we saw many customers with solid-sized teams of Postgres experts whose primary job was constant tuning, operating, and essentially babysitting the system to keep it performing at scale.
Side note, we’re seeing purpose-built technologies show up much earlier in a company’s lifecycle, likely accelerated by AI-driven use cases. At ClickHouse, many customers using Postgres replication are seed-stage companies that have grown extremely quickly. We pulled together some data on these trends here:
https://clickhouse.com/blog/postgres-cdc-year-in-review-2025...
A better approach would be to embrace the integration of purpose-built technologies with Postgres, making it easier for users to get the best of both worlds, rather than making overgeneralized claims like “Postgres for everything” or “Just use Postgres.”
I personally see a difference between “just use Postgres” and “make Postgres your default choice.” The latter leaves room to evaluate alternatives when the workload calls for it, while the former does not. When that nuance gets lost, it can become misleading for teams that are hitting or even close to hitting—the limits of Postgres, who may continue tuning Postgres spending not only time but also significant $$. IMO a better world is one where developers can have a mindset of using best-in-class where needed. This is where embracing integrations with Postgres will be helpful!
I think that the key point being made by this crowd, of which I'm one, is somewhere in the middle. The way I mean it is "Make Postgres your default choice. Also *you* probably aren't doing anything special enough to warrant using something different".
In other words, there are people and situations where it makes sense to use something else. But most people believing they're in that category are wrong.
> Also you probably aren't doing anything special enough to warrant using something different".
I always get frustrated by this because it is never made clear where the transition occurs to where you are doing something special enough. It is always dismissed as, "well whatever it is you are doing, I am sure you don't need it"
Why is this assumption always made, especially on sites like HackerNews? There are a lot of us here that DO work with scales and workloads that require specialized things, and we want to be able to talk about our challenges and experiences, too. I don't think we need to isolate all the people who work at large scales to a completely separate forum; for one thing, a lot of us work on a variety of workloads, where some are big enough and particular enough to need a different technology, and some that should be in Postgres. I would love to be able to talk about how to make that decision, but it is always just "nope, you aren't big enough to need anything else"
I was not some super engineer who already knew everything when I started working on large enough data pipelines that I needed specialized software, with horizontal scaling requirements. Why can't we also talk about that here?
The point is really that you can only evaluate which of alternatives is better once you have working product with data big enough - else it's just basically following trends and hoping your barely informed decision won't be wrong.
Postgres is widely used enough with enough engineering company blog posts that the vast majority of NotPostgres requests already have a blog post that either demonstrates that pg falls over at the scale that’s being planned for or it doesn’t.
If they don’t, the trade off for NotPostgres is such that it’s justifiable to force the engineer to run their own benchmarks before they are allowed to use NotPostgres
Agree to disagree here. I see a world where developers need to think about (reasonable) scale from day one, or at least very early. We’ve been seeing this play out at ClickHouse - the need for purpose-built OLAP is reducing from years to months. Also integration with ClickHouse is few weeks of effort for potentially significantly faster performance for analytics.
Here's my opinion: just use postgres. If you're experienced enough to not when I say that, go for it, the advice isn't for you. If you aren't, I'm probably saving you from yourself. "Reasonable scale" to these people could mean dozens of inserts per second, which is why people talking vagueries around scale is madenning to me. If you aren't going to actually say what that means, you will lead people who don't know better down the wrong path.
I see a world where developers need to think about REASONABLE scale from day one, with all caps and no parentheses.
I've sat in on meetings about adding auth rate limiting, using Redis, to an on-premise electron client/Node.js server where the largest installation had 20 concurrent users and the largest foreseeable installation had a few thousand, in which every existing installation had an average server CPU utilisation of less than a percent.
Redis should not even be a possibility under those circumstances. It's a ridiculous suggestion based purely on rote whiteboard interview cramming. Stick a token_bucket table in Postgres.
I'm also not convinced that thinking about reasonable scale would lead to a different implementation for most other greenfield projects. The nice thing about shoving everything into Postgres is that you nearly always have a clear upgrade path, whereas using Redis right from the start might actually make the system less future-proof by complicating any eventual migration.
Ack, agreed. But there’s a better way to communicate than making blanket statements like “just use Postgres.” For example, you could say “Postgres is the default database,” etc.
Don’t get me wrong—I’m a huge huge fan of Postgres. I’ve worked at Postgres companies for a decade, started a Postgres company, and now lead a Postgres service within a company! But I prefer being real rather than doing marketing hype and blanket love.
This is my philosophy. When the engineer comes to me and says that they want to use NotPostgres, they have to justify why, with data and benchmarks, Postgres is not good enough. And that’s how it should be
> I don’t agree with the blanket advice of “just use Postgres.”
I take it as meaning use Postgres until there's a reason not to. ie build for the scale / growth rate you have not "how will this handle the 100 million users I dream of." A simpler tech stack will be simpler to iterate on.
Postgres on modern hardware can likely service 100 million users unless you are doing something data intensive with them.
You can get a few hundred TB of flash in one box these days. You need to average over 1 MB of database data per user to get over 100 TB with only 100 million users. Even then, you can mostly just shard your DB.
You can do about 100k commits per second, but this also partly depends on the CPU you attach to it. It also varies with how complicated the queries are.
With 100 million DAU, you're often going to have problems with this rate unless you batch your commits. With 100 million user accounts (or MAU), you may be fine.
Yes. That's a good framing. PostgreSQL is a good default for online LOB-y things. There are all sorts of reasons to use something other than PostgreSQL, but raw performance at scale becomes such a reason later than you think.
Cloud providers will rent you enormous beasts of machines that, while expensive, will remain cheaper than rewriting for a migration for a long time.
Postgres is infinitely extensible, more than MariaDB. But it's very painful to write or configure extensions and you might as well use something different instead of reaching for an extension mechanism.
> At Citus Data, we saw many customers with solid-sized teams of Postgres experts whose primary job was constant tuning, operating, and essentially babysitting the system to keep it performing at scale.
Oh no, not a company hiring a team of specialist in a core technology you need! What next, paying them a good wage? C'mon, it's so much better to get a bunch of random, excuse me, "specialized" SaaS tools that will _surely_ not lead to requiring five teams of specialists in random technologies that will eventually be discontinued once Google acquires the company running them.
OK but seriously, yeah sometimes "specialized" is good, though much less rarely than people pretend it to be. Having specialists ain't bad, and I'd say is better than telling a random developer to become a specialist in some cloud tech and pretending you didn't just end up turning a - hopefully decent - developer into a poor DBA. Not to mention that a small team of Postgres specialists can maintain a truly stupendous amount of Postgres.
At my company I saw a team of devs pay for a special purpose "query optimized" database with "exabyte capability" to handle... their totally ordinary HR data.
I queried said database... it was slow.
I looked to see what indexes they had set up... there were none.
That team should have just used postgres and spent all the time and money they poured into this fancy database tech on finding someone who knew even a little bit about database design to help them.
I hate how developers are often very skeptical but all the skepticism goes out the window if the tech is sufficiently hyped up.
And TBH, developers are pretty dumb not to realize that the tech tools monoculture is a way for business folks to make us easily replaceable... If all companies use the same tech, it turns us into exchangeable commodities which can easily be replaced and sourced across different organizations.
Look at the typical React dev. They have zero leverage and can be replaced by vibe coding kiddies straight out of school or sourced from literally any company on earth. And there are some real negatives to using silver bullet tools. They're not even the best tools for a lot of cases! The React dev is a commodity and they let it happen to them. Outsmarted by dumb business folks who dropped out of college. They probably didn't even come up with the idea; the devs did. Be smarter people. They're going to be harvesting you like Cavendish.
Sure, but the world is vast. I would love to be able to test every UI framework and figure out which is the best, but who’s got time for that? You have to rely on heuristics for some things, and popularity is often a decent indicator.
Popularity’s flip side is that it can fuel commodification.
I argue popularity is insufficient signal. React as tech is fine, but the market of devs who it is aimed at may not be the most discerning when it comes to quality.
I do agree, I don’t know why more people don’t just use Postgres. If I’m doing data exploration with lots of data (e.g., GIS, nD vectors), I’ll just spin up a Postgres.app on my macOS laptop, install what little I need, and it just works and is plenty fast for my needs. It’s a really great choice for a lot of domains.
That being said, while I think Postgres is “the right tool for the job” in many cases, sometimes you just want (relative) simplicity, both in terms of complexity and deployment, and should use something like SQLite. I think it’s unwise to understate simplicity, and I use it to run a few medium-traffic servers (at least, medium traffic for the hardware I run it on).
> in many cases, sometimes you just want (relative) simplicity, both in terms of complexity and deployment, and should use something like SQLite.
So many times when trying to just go for simplicity with SQLite it takes me like one working day until I run up against enough annoyances to where resolving those is more work than setting up the "set up and forget" postgres instance.
Granted, this is for personal stuff... but "Postgres packaged for low maintenance" is present in a lot of OS package managers! Even for smaller data analysis work SQLite perf leads _loads_ to be desired (once had QGIS struggling with a sqlite DB... pg made everything mostly instant. Indices etc... but stuff I _couldn't easily get with sqlite_)
If SQLite works for you that's great, I do think it's worth it for people to _try_ to do simple pg setups to understand just how painful it is to use pg (for me: not that high)
Oh, yes, that's one of my points! I think Postgres is a great way to deal with tons of data, and it's really the only thing I use to do any sort of analysis or informatics (that and Parquet + Dask).
I am also a fan of SQLite. One of the best parts during development is how easy it is to spin up and spin down databases for full integration tests without containers or anything. It also simplifies backups, and is probably good enough.
These days I would recommend PGlite for testing purposes when you use Postgres in production. That way you don't need any specific SGQLite vs Postgres behavior switches.
Where I have used SQLite most successfully is really two use cases. First, I use it for data processing. Say I need to retrieve lots of data and transform it to a different setup. I could do that in something like Python but SQL is just more expressive for that and I can also create a new database, populate it with data I have, fetch new data, combine it together, export the update to a permanent data store (usually Postgres).
Second, when I need a local save file. Sometimes small local apps are better served by a save file and they save file might as well have an extensible format that I can update as I go. This is more rare but still can be useful.
The first use case is very powerful. A temporary SQL database that can be blown away with zero trace of it is great. And the ability to run complex queries on it can really help.
But 99% of the time I just use Postgres. It works, it has sane defaults, it is crazy extensible, and it has never not met my needs, unlike Oracle or MySQL.
i think the topic of "what data backend" gets super conflated into many different variations of what the hell people need it for. discussions here go so many different directions. some people are building simple webapps, some are building complex webapps that need to scale for a gazillion users, some are building local apps, some are just tinkering, some are thinking broadly about how their backend needs to sync with a datalake->some data warehouse at an org, yadda yadda ya.
i personally like postgres myself for just about all use cases that must be shared with others (app with more than one client that might be providing CRUD updates or anything really that demands a central data store). ive used sqlite a couple times with WAL to try and make a small app shared between 2-3 people who all would contribute updates thru it but it wasnt ideal. for postgres so many features/extensions its concurrent writes are fast as hell and if you just want to one-shot a solution then you cant go wrong, but it's ofc not the same as sqlite setup.
i think a lot of the pain with postgres is just learning to effectively be a knowledgeable db admin of sorts. its somewhere between being a competent devops guy and a dbadmin expert all in one. if you're actually doing some kind of production deployment it is kinda scary hoping you've got everything set up right. even supabase which makes this whole process trivial to get going requires an understanding of not-always-understood security premises that just make things spooky.
lot of words to say i dont get much out of these discussions tbh. theres just too many use cases and variables in everyones working/hobby lives im not sure that there is a proverbial bottom to any of it. some will use sqlite and some will use postgres and some will use some weird thing no ones heard of because they're afraid to rawdog sql and just want immediate graphql capability to be the main mode of data retrieval. some will show up here and talk about why you need redis in the middle.
its too much noise so i just keep using postgres because its free and boring and fast. end of the day i just want to make stuff people can use. it's a hard endeavor to do well alone, if you dont have a team of other experts who can help you put all the puzzle pieces together on how to deploy things the right way and also add pieces like redis or whatever... it's just a lot. it's hard to find where to get started. sqlite is the only solution that really by nature of what it is seems to champion the lonely developer, but the tradeoffs are big if you're trying to make something that should get used by many people.
A bit off topic but the one thing I've never been able to figure out with Postgres easily & reliably is what magic incantations allow a user account full access to a specific database but not to others, particularly in cases of managed postgres offered by cloud providers. `GRANT ALL PRIVILEGES` never seems to work.
Having to look up and spend time fixing permissions every time itself makes using Postgres for simple uses difficult for me but if you're using it ad hoc, any tips?
I ran into this once... I think there's something about the grant not working on new objects or being one level too low? I tended to solve those problems by granting ownership of the db itself.
99% of the time I've used Postgres it has been one user and one database. The one time I needed to create and configure a separate user with different permissions I remember it being thoroughly confusing and I think the DBA ended up doing it.
I wish PostgreSQL had a native vector implementation instead of using extensions. They're kind of a pain in the ass to maintain, especially with migrations.
Interestingly almost all of postgres is an extension including the stuff you expect to be built in. All data types, all index types, all operators, and the implementation of ordinary tables I think
For me the showstopper missing feature is a standard and native implementation of temporal tables. Once you use those effectively in an application, it become something you can't do without.
Standardizing on one single tiny little project is always a bad idea. Why? Some examples (which are admittedly not related to postgres, because I don't know their structure):
1) A single person, doing a ton of heavy lifting, leaves, or worse, turns it over, or silently sells out to a nefarious person.
2) A severe security vulnerability is found. If everyone is using postgres, everyone is vulnerable. Bonus points if the vulnerability is either not publicly disclosed or it is hard to fix.
3) Commercial/Government interests heavily influence and push the project into places that could make it vulnerable in any given way. This is absolutely a thing.
4) AI. No clarification here. Just use your imagination, with recent news regarding FFMPEG and other projects in mind.
Op calling the de jure database solution (pg) in the world “tiny” is pretty laughable. It’s one of the most popular solutions for databases in general and RDBMS specifically. SQLite is also massive in terms of its adoption and use
No, seriously, people need to be punished for submitting LLM-generated garbage without specifying that it's LLM-generated garbage. 400+ points, oh my god, people, what's wrong with you...
We buried the post for seeming obviously-LLM-generated. But please email us about these (hn@ycombinator.com) rather than posting public accusations.
There are two reasons why emailing us is better:
First, the negative consequences of a false allegation outweigh the benefits of a valid accusation.
Second and more important: we'll likely see an email sooner than we'll see a comment, so we can nip it in the bud quickly, rather leaving it sitting on the front page for hours.
You buried a popular post because of the public accusation or just your "hunch"?
Why not let your audience decide what it wants to read?
I say this as a long time HN reader, who feels like the community has become grumpier over the years. Which I feel like is a shame. But maybe that's just me.
It's my job to read HN posts and comments all day, every day, and these days that means spending a lot of time evaluating whether a post seems LLM-generated. In this case the post seems LLM-generated or heavily LLM-edited.
We have been asking the community not to publicly call out posts for being LLM-generated, for the reasons I explained in the latest edit of the comment you replied to. But if we're going to ask the community that, we also need to ask submitters to not post obviously-LLM-influenced articles. We've been asking that ever since LLMs became commonplace.
> I say this as a long time HN reader, who feels like the community has become grumpier over the years. Which I feel like is a shame. But maybe that's just me.
We've recently added this line to the guidelines: Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative.
HN has become grumpier, and we don't like that. But a lot of it is in reaction to the HN audience being disappointed at a lot of what modern tech companies are serving up, both in terms of products and content, and it doesn't work for us to tell them they're wrong to feel that way. We can try, but we can't force anyone to feel differently. It's just as much up to product creators and content creators to keep working to raise the standards of what they offer the audience.
Thanks Tom, I appreciate the openness. You are seemingly overriding the wishes of the community, but it your community and you have the right to do so. I still think it's a shame, but that's my problem.
> You are seemingly overriding the wishes of the community
That's false. The overwhelming sentiment of the community is that HN should be free of LLM-generated content or content that has obvious AI fingerprints. Sometimes people don't immediately realize that an article or comment has a heavy LLM influence, but once they realize it does, they expect us to act (this is especially true if they didn't realize it initially, as they feel deceived). This is clear from the comments and emails we get about this topic.
If you can publish a new version of the post that is human-authored, we'd happily re-up it.
I’d be grumpy over wasting my time on an HN post that’s LLM generated which doesn’t state that it is. If I wanted this, I could be prompting N number of chat models available to me instead of meandering over here.
I just pasted the first paragraph in an "AI detector" app and it indeed came back as 100% AI. But I heard those things are unreliable. How did you determine this was LLM-generated? The same way?
Apart from the style of the prose, which is my subjective evaluation: This blog post is "a view from nowhere." Tiger Data is a company that sells postgres in some way (don't know, but it doesn't matter for the following): they could speak as themselves, and compare themselves to companies that sell other open source databases. Or they could showcase benchmarks _they ran_.
Them saying: "What you get: pgvectorscale uses the DiskANN algorithm (from Microsoft Research), achieving 28x lower p95 latency and 16x higher throughput than Pinecone at 99% recall" is marketing unless they give how you'd replicate those numbers.
Point being: this could have been written by an LLM, because it doesn't represent any work-done by Tiger Data.
For what it's worth, TigerData is the company that develops TimescaleDB, a very popular and performant time series database provided as a Postgres extension. I'm surprised that the fact that TigerData is behind it is not mentioned anywhere in the blog post. (Though, TimescaleDB is mentioned 14 times on the page).
Just using LLMs enough I've developed a sense for the flavor of writing. Surely it could be hidden with enough work, but most of the time it's pretty blatant.
It’s got that LLM flow to it. Also liberal use of formatting. It’s like it cannot possibly emphasize enough. Tries to make every word hit as hard as possible. Theres no filler, nothing slightly tangential or off topic to add color. Just many vapid points rapid fire, as if they’re the hardest hitting truth of the decade lol
ChatGPT has a pretty obvious writing style at the moment. It's not a question of some nebulous "AI detector" gotcha, it's more just basic human pattern matching. The abundant bullet points, copious bold text, pithy one line summarizing assertions ("In the AI era, simplicity isn’t just elegant. It’s essential."). There are so many more in just how it structures its writing (eg "Let’s address this head-on."). Hard to enumerate everything, frankly.
I know everybody just wants to talk about Postgres but it’s still sad to see any sort of engagement with slop. Even though the actual article is essentially irrelevant lol
This kind of thing gets posted every couple of months. Databases like Pinecone and Redis are more cost-effective and capable for their special use case, often dramatically so. In some circumstances the situation favours solving the problem in Postgres rather than adding a database. But that should be evaluated on a case-by-case basis. For example, if you run something at scale and have an ops team the penalty of adding a second database is much smaller.
(I run a medium-sized Postgres deployment and like it, but I don't feel like it's a cost-effective solution to every database problem.)
Once you have an app, a lot of data, and actual problems, it's far easier to pick the right alternative.
PostgreSQL is good enough to get to medium sized with nearly every use case. Once you are there, you have the use case and the test data to test any alternative for it well, rather than trying to guess beforehand what you actually need.
The advice is basically "PostgreSQL is probably good enough for whatever you're building now, and you should only look for other solution once you are big enough that it stops being that"
Could make the same argument for SQLite, the threshold is lower, but similarly you can pretty far with it. Then decide what's next, once you're out growing it.
MySQL is definitely easier to use if you don’t want to ever have to think about DB maintenance; on the other hand, you’re giving up a TON of features that could make your queries enormously performant if your schema is designed around them - like BRIN indices, partial indices, way better partition management, etc.
OTOH, if and only if you design your schema to exploit MySQL’s clustering index (like for 1:M, make the PK of the child table something like (FK, some_id)), your range scans will become incredibly fast. But practically no one does that.
As someone who learned to think in MySQL, this is really true, at the time Postgres was a viable alternative too, only the tooling to get started reached me a little easier and quicker.
The major thing I advocate for is don't pick a NOSQL database to avoid relational dbs, only to try and do a bunch of relational work in NOSQL that would have been trivial in an RBDMS. Postgres can even power graph query results which is great.
Major change for us to replace postgresql was replication and HA across geographies - i think on the Postgres side greenplum and cockroachdb are/were an option.
With MySQl variants like percona xtradb setup can go from 1 instance to cluster to geo replicating cluster with minimal effort.
While vanilla postges for an equivalent setup is basically pulling teeth.
I have a lot of respect for Postgres' massive feature set, and how easy it is to immediately put to use, but I don't care for the care and feeding of it, especially dealing with upgrades, vacuuming, and maintaining replication chains.
Once upon a time, logical replication wasn't a thing, and upgrading major versions was a nightmare, as all databases in the chain had to be on the same major version. Upgrading big databases took days because you had to dump and restore. The MVCC bloat and VACCUM problem was such a pain in the ass, whereas with MySQL I rarely had any problems with InnoDB purge threads not able to keep up with garbage collecting historical row versions.
Lots of these problems are mitigated now, but the scars still sometimes itch.
I agree that SQLite requires less maintenance, but you still need to vacuum to prevent the database file from accumulating space (for apps, I run VACUUM at startup).
SQLite vacuum is only needed to shrink the database after you remove a lot of data. It's not needed in routine operations like postgres does. Postgres has autovacuum usually on by default so I'm not understanding the complaint much
It’s kind of remarkable how little operational maintenance mysql requires. I’ve been a Postgres fan for a long time, but after working with a giant mysql cluster recently I am impressed. Postgres requires constant babysitting, vacuums, reindexing, various sorcery. MySQL just… works.
There's a fascinating gap between PostgreSQL theory and practice here. Elsewhere in this thread, I complained that PostgreSQL extensions can't do everything yet. One thing they can do, however, or ought to be able to do, is provide alternative storage engines. That's the central thing they're supposed to be especially good at providing.
I really wish I could but it's hard to embed in local-first apps and packages without forcing users to set up docker.
PGlite would be perfect if only it allowed multiple writer connections. SQLite is ok but I want PG extensions and I want true parallel multi-writer support!
Caching is mentioned in the article: What do you guys feel about using PostgreSQL for caching instead of Redis?
Redis is many times faster, so much that it doesn't seem comparable to me.
A lot of data you can get away with just caching in-mem on each node, but when you have many nodes there are valid cases where you really want that distributed cache.
Run benchmarks that show that, for your application under your expected best-case loads, using Redis for caching instead of PostgreSQL provides a meaningful improvement.
If it doesn't provide a meaningful improvement, stick with PostgreSQL.
This is the proper approach when deciding whether to use any type of tool or technology. Is the increased amount of cognitive overhead for someone with minimal exposure to your system (who will have to maintain it when you’ve moved on) worth the increased performance on a dollars-per-hour basis? If so, it may be a good option. If not, it doesn’t matter how much better the relative performance is.
Just use memcache for query cache if you have to. And only if you have to, because invalidation is hard. It's cheap, reliable, mature, fast, scalable, requires little understanding, has decent quality clients in most languages, is not stateful and available off the shelf in most cloud providers and works in-clusetr in kubernetes if you want to do it that way.
I can't find a use case for Redis that postgres or postgres+memcache isn't a simpler and/or superior solution.
Just to give you an idea how good memcache is, I think we had 9 billion requests across half a dozen nodes over a few years without a single process restart.
memcached is multithreaded, so it scales up better per node.
memcached clients also frequently uses ketama consistent hashing, so it is much easier to do load/clustering, being much simpler than redis clustering (sentinel, etc).
Mcrouter[1] is also great for scaling memcached.
dragonfly, garnet, and pogocache are other alternatives too.
If you want to compare Redis and PostgreSQL as a cache, be sure to measure an unlogged table, as suggested in the article. Much of the slowness of PostgreSQL is to ensure durability and consistency after a crash. If that isn't a concern, disable it. Unlogged tables are automatically truncated after a crash.
Depends on your app cache needs. If it's moderate, I'd start with postgres...ie. not have operate another piece of infra and the extra code. If you are doing the shared-nothing app server approach (rails, django) where the app server remembers nothing after each request Redis can be a handy choice. I often go with having a fat long lived server process (jvm) where it also acts for my live caching needs. #tradeoffs
I say do it, if it simplifies the architecture. For example if you are using firestore with a redis cache layer, that's 2 dbs. If you can replace 2 dbs with 1 db (postgres), I think it's worth it. But if you are suggesting using a postgres cache layer in front of firestore instead of redis... to me that's not as clear cut.
Depends how much you have to cache and how much speed you really need from it.
I like Redis a lot, but for things in the start I'm not sure the juice is always worth the squeeze to get it setup and manage another item in teh stack.
Luckily, search is something that has been thought about and worked on for a while and there's lots of ways to slice it initially.
I'm probably a bit biased though from past experiences from seeing so many different search engines shimmed beside or into a database that there's often an easier way in the start than adding more to the stack.
Skeptical about replacing Redis with a table serialized to disk. The point of Redis is that it is in memory and you can smash it with hot path queries while taking a lot of load off the backing DB. Also that design requires a cron which means the table could fill disk between key purges.
I the article is wrong. UNLOGGED means it isn't written to WAL which means recovery and rollback guarantees won't work since the transaction can finish before the page can be synchronized on disk. The table loses integrity as a trade off for a faster write.
It really depends on your use case, doesn't it? I'd say, just use Postgres... until you have a reason not to. We used finally switched to Elasticsearch to power user search queries of our vehicle listings a few years ago, and found its speed, capabilities, and simplicity all significant improvements compared to the MariaDB-based search we'd been using previously. (Postgres's search features are likely better than MariaDB's, but I expect the comparison holds.) But that's the core of our product, and while not giant, our scale is significant. If you're just doing some basic search, you don't need it. (We managed for many years just fine without.)
I've never really regretted waiting to move to a new tool, if we already had something that works. Usually by doing so you can wait for the fads to die down and for something to become the de facto standard, which tends to save a lot of time and effort. But sometimes you can in fact get value out of a specialized tool, and then you might as well use it.
Huh, apparently this is controversial, based on the score ping-ponging up and down! I'm not really sure why though. Is it because of the reference to MariaDB?
Now we only need easy self-hosted Postgres clustering for HA. Postgres seems to need additional tooling. There is Patroni, which doesn't provide container images. There is Spilo, which provides Postgres images with Patroni, but they are not really maintained. There is a timescaledb-ha image with Patroni, but no documentation how to use it. It seems the only easy way for hosting a Postgres cluster is to use CloudNativePG, but that requires k8s.
It would be awesome to have easy clustering directly built-in. Similar to MongoDB, where you tell the primary instance to use a replica set, then simply connect two secondaries to primary, done.
Postgres is not a CP database, and even with synchronous replication, it can lose writes during network partitions. It would not pass the Jepsen test suite.
This is very hard to fix and requires significant architectural changes (like Yugabyte or Neon have done).
But it's perfect HN bait, really. The title is spicy enough that folks will comment without reading the article (more so than usual), and so it survives a bit longer before being flagged as slop.
Is HN guidelines to flag AI content? I am unsure of how flagging for this is supposed to work on HN and have only ever used the flag feature for obvious spam or scams.
It might be wrong, but I have started flagging this shit daily. Garbage articles that waste my time as a person who comes on here to find good articles.
I understand that reading the title and probably skimming the article makes it a good jumping off point for a comment thread. I do like the HN comments but I don't want it to be just some forum of curious tech folks, I want it to be a place I find interesting content too.
I agree. It seems this is kind of a shelling point right now on HN and there isn't a clear guideline yet. I think your usage of flagging makes sense. Thanks
You're absolutely right! Let's delve into why blog posts like this highlight the conflict between the speed and convenience of AI and authentic human expression. Because it's not just about fears of hallucination—it's about ensuring the author's voice gets heard. Clearly. Completely. Unmistakably.
Granted it's often easy to tell on your own, but when I'm uncertain I use GPTZero's Chrome extension for this. Eventually I'll stop doing that and assume most of what I read outside of select trusted forums is genAI.
I'll take it one step further and say you should always ask yourself if the application or project even needs a beefy database like Postgres or if you can get by with using SQLite. For example, I've found a few self-hosted services that just overcomplicated their setup and deployment because they picked Postgres or MariaDB over SQLite, despite it being a much better self-contained solution.
I find that if I want to use JSON storage I'm somewhat stuck choosing my DB stack. If I want to use JSON, and change my database from SQLite to Postgres I have to substantially change my interface to the DB. If I use only SQLite, or only Postgres it's not so bad, but the transition cost to "efficient" JSON use in Postgres from a small demo in SQLite is kind of high compared to just starting with an extra docker run (for a Postgres server) / docker compose / k8s yaml / ... that has my code + a Postgres database.
I really like having some JSON storage because I don't know my schema up front all the time, and just shoving every possible piece of potentially useful metadata in there has (generally) not bit me, but not having that critical piece of metadata has been annoying (that field that should be NOT NULL is NULL because I can't populated it after the fact).
SQLite is great until you try to do any kind of multi-writer stuff. Theres no SELECT FOR UPDATE locking and no parallel write support, if any of your writes take more than a few ms you end up having to manage queueing at the application layer, which means you end up having to build your own concurrent-safe multi-writer queue anyway.
I've found that Postgres consumes (by default) more disk than, for example, MySQL. And the difference is quite significant. That means more money that I have to pay every month. But, sure Postgres seems like I system that integrates a lot of subsystems, that adds a lot of complexity too. I'm just marking the bad points because you mention the good points in the post. You're also trying to sell you service, which is good too.
The problem is that Postgres uses something like 24B overhead per row. That is not a issue with small Tables, but when your having a few billion around, each byte starts to add up fast. Then you a need link tables that explode that number even more, etc ... It really eats a ton of data.
At some point you end up with binary columns and custom encoded values, to save space by reducing row count. Kind of doing away with the benefits of a DB.
Yeah postgres and mariadb have some different design choices. I'd say use either one until it doesn't work for you. One of the differences is the large row header in postgres.
On flipside, restore from plain postgresql dump is much, much faster than plain mysql backup. There are alternative strategies for mysql but that's extra work
I am curious if you know anyone using Btrfs for this too. I like ZFS, but it Btrfs can do this it would be easier to use with some distros, etc. as it's supported in kernel.
The big problem for me from running DB on Btrfs is that when I delete large dirs or files (100GB+), it locks disk system, and Db basically stop responding on any queries.
I am very surprised that FS which is considered prod grade having this issue..
I haven't used XFS in almost two decades, does it have compression support in the same way? Also, does it do JBOD stuff? I know it's a bit of a different thing, but I really enjoy the pool many disks together part of Btrfs, although it has its limitations.
XFS doesn't have inline compression, nor does it have volume management functionality. It's a nice filesystem (and it's very fast) but it's just a filesystem.
The real problem is, I'm so danged familiar with the MySQL toolset.
I've fixed absolutely terrifying replication issues, include a monster split brain where we had to hand pick off transactions and replay them against the new master. We've written a binlog parsing as an event source to clear application caching. I can talk to you about how locking works, when it doesn't (phantom locks anyone?), how events work (and will fail) and many other things I never set out to learn but just sort of had to.
While I'd love to "just use Postgres" I feel the tool you know is perhaps the better choice. From the fandom online, it's overall probably the better DBMS, but I would just be useless in a Postgres world right now. Sorta strapped my saddle to the wrong start unfortunately.
Start learning on the side. Know one well, the time to learn another is much shorter. Bet you could be well on your way in just a few weeks. Not to mention getting away from the Oracle stink.
I am looking for a db that runs using existing json/yaml/csv files, saves data back to those files in a directory, which I can sync using Dropbox or whatever shared storage. Now I can run this db wherever I am & run the application. Postgres feels a bit more for my needs
Why? Why would separate json/yaml/csv files be better than just... syncing using postgres itself? You point `psql` to the host you need, because clearly you have internet access even on the go, and done: you don't need to sync anything, you already have remote access to the database?
Love the sentiment! And I'm a user - but what about aggregations? Elasticsearch offers a ton of aggregates out of the box for "free" completely configurable by query string.
Tiger Data offers continuous aggs via hypertable but they need to be configured quite granularly and they're not super flexible. How are you all thinking about that when it comes to postgres and aggregations?
I love postgres and it really is a supertool. But to get exactly what you need can require digging deep and really having control over the lowest levels. My experience after using timescale/tigerdata for the last couple years is that I really just wish RDS supported the timescale extension; TigerData's layers on top of that have caused as many problems as they've solved.
Gad, they sure like to say "BM25" over and over again. That's a near worthless approach to result ranking. Doing any halfway ok job requires much more tuned and/or more powerful approaches.
I have two fundamental problems with Postgres - an excellent piece of technology, no questions about that.
First, to use Postgres for all those cases you have to learn various aspects of Postgres. Postgres isn't a unified tool which can do everything - instead it's a set of tools under the same umbrella. As a result, you don't save much from similarly learning all those different systems and using Postgres only as a RDBMS. And if something isn't implemented in Postgres better than in a 3rd party system, it could be easier to replace that 3rd party system - just one part of the system - rather than switching from Postgres-only to Postgres-and-then-some. In other words, Postgres has little benefits when many technologies are needed comparing with the collection of separate tools. The article notwithstanding.
Second, Postgres is written for HDDs - hard disk drives, with their patterns of data access and times. Today we usually work with SSDs, and we'd benefit from having SSD-native RDBMSes. They exist, and Postgres may lose to them - both in simplicity and performance - significantly enough.
Well, take a look at the dates of when Postgres was created and when SSDs become available. Better, find articles about internal algorithms, B-trees, times of operations like seeks etc. The Postgres was initially written with disk operation timings in mind, and the point is that's changing - and I haven't heard of Postgres architecture changing with that.
This really isn't true. You should use different parameters (specifically, you can reduce the random_page_cost to a little over 1) on a SSD but there isn't a really compelling reason to use a completely different DBMS for SSDs.
SSD-native RDBMS sounds good in theory! What's in mean in practice? What relational databases are simpler and more performant? Point me in their direction!
This post is discussing more specialized databases, but why would people choose Oracle/Microsoft DB instead of Postgres? Your own experience is welcome.
I'd pick MSSQL if I was being compensated based upon the accuracy, performance, durability and availability of the system. Also in any cases where the customer can pay and won't complain even once about how much this costs.
I'd never advocate for a new oracle install. But, I'd likely defend an existing one. I've seen how effective pl/sql can be in complex environments. Rewriting all that sql just because "oracle man bad" (or whatever) is a huge fucking ask of any rational business owner.
Easy answer here - nearly every LOB app we have uses MSSQL.
I've had engineers want to talk about syncing it to MySQL using some custom plumbing so that they can build a reporting infra around their MySQL stack, but it's just another layer of complexity over having code just use Microsoft's reporting services.
I'll add, having finance people with Excel really like being able to pull data directly from MSSQL, they do not like hearing about a technican's python app.
Elixir + Postgres is the microservices killer...last time I saw VP try to convince a company with this stack to go microservices he was out in less than 6mo
Postgres can definitely handle a lot of use cases; background job scheduling always had me tempted to reach for something like rabbitmq but so far happy enough with riverqueue[0] for Go projects.
I made the switch from MySQL to postgres a few years ago I didn't really understand what everyone was excited about before I made the switch. I haven't used MySQL since and I think postgres provides everything I need the only thing that I ever snarl at is how many dials and knobs and options there are that's not a bad thing!
> the only thing that I ever snarl at is how many dials and knobs and options there are that's not a bad thing!
yea this is me. postgres is actually insane for how much is offered at checks notes free.99.
_however_ we are probably due for like. I don't know a happy configurator type tool that has reasonable presets and a nice user friendly config tool that helps people get going without sidequesting for a year on devops/dbadmin expertise. that isn't even a favored outcome imo, you just get pretty lukewarm postgres-deployers who are probably missing a bunch of important settings/flags. my team mates would probably shit themselves in the face of postgres configs currently, they are absolute rats in the code but good and proper deployment of postgres is just a whole other career-arc they haven't journeyed and a _lot_ of organizations don't always have a devops/dbadmin type guy readily available any time you want to scrap together an app who's just going to wait for your signal to deploy for you. or said devops/dbadmin guy is just.. one guy and he's supporting 500 other things. not saying the absence/failing to scale teams with such personnel is right, it's just the reality and being up against workplace politics and making the case to convince orgs to hire a bigger team of devops/dbadmin guys involves a lot of shitty meetings and political prowess that is typically beyond an engineers set of capabilities, at least below the senior level. any engineer can figure out how to deploy postgres to something, but are they doing it in a way that makes an orgs security/infra guys happy? probably not. are they prepared to handle weird storage scenarios (log or temp space filling grinding server to a halt) and understand the weird and robust ways to manage a deployment? probably not.
It irks me that these "just use Postgres" posts only talk about feature sets with no discussion about operations, reliability, real scaling, or even just guard rails and opinions to deter you from making bad design decisions. The author writes about how three nine's is multiplied over several dependencies, but that's not how this shakes out in practice. Your relational database is typically far more vulnerable than distributed alternatives. "Just use Postgres" is fine advice but gets used as a crutch by companies who wind up building everything in-house for no good reason.
I have a colleague who (inexplicably) doesn't trust Postgres for "high performance" applications. He needed a database of shared state for a variable number of running containers to manage a queue, so he decided to implement his own bespoke file-based database, using shared disk. Lo and behold, during the first big (well-anticipated) high-demand event, that system absolutely crawled. It ran, but it was a total bottleneck during two days of high demand. I, who has made a New Years resolution to no longer spend political capital on things that I can't change, looked on with a keen degree of schadenfreude.
Pinecone allows hybrid search, merging dense and sparse vector embeddings that Postgres can't do AFAIK. That results in ~10% worse retrieval scores which might be the difference between making it in the business or not.
The lesson for me isn't "don't use Pinecone", but more like "did you already max out Postgres?"
In many cases, it is going to save you time by having less infra and one less risk while you're getting started. And if you find yourself outgrowing the capabilities of Pg then you look for an alternative.
With BM25 which has a far worse/non-generalizable performance than sparse embeddings Pinecone supports. Moreover you get a latency hit from RRF that makes it challenging to use for e.g. real-time multimodal chat agents.
Can anyone comment on whether postgres can replace full columnar DB? I see "full text search" but it feels like this is falling a little short of the full power of elastic -- but would be happy to be wrong (one less tech to remember).
Not great. It works but the performance isn't ideal for this case. Still the advice is sound. You can build a new analytics system when you're doing enough analytical queries on enough data to bog down the primary system.
I don't disagree, but I think big enterprises expect support, roadmaps, and the ability to ask for deliverables depending on the sale or context of the service.
I agree that managing lots of databases can be a pain in the ass, but trying to make Postgres do everything seems like a problem as well. A lot of these things are different things and trying to make Postgres do all of them seems like it will lead to similar if not worse outcomes than having separate dedicated services.
I understand that people were too overeager to jump on the MongoDB web scale nosql crap, but at this point I think there might have been an overcorrection. The problem with the nosql hype wasn't that they weren't using SQL, it's that they were shoehorning it everywhere, even in places where it wasn't a good fit for the job. Now this blog post is telling us to shoehorn Postgres everywhere, even if it isn't a good fit for the job...
Ok, I'm a little skeptical of that claim but let's grant it. I still don't think Postgres is going to do everything Kafka and Redis can do as well as Kafka or Redis.
The point of Redis is data structures and algorithmic complexity of operations. If you use Redis well, you can't replace it with PostgreSQL. But I bet you can't replace memcached either for serious use cases.
As someone who is a huge fan of both Redis and Postgres, I whole heartedly agree with the "if you are using Redis well, you can't replace it with PostgreSQL" statement.
What I like about the "just use PostgreSQL" idea is that, unfortunately, most people don't use Redis well. They are just using it as a cache, which IMHO, isn't even equivalent to scratching the surface of all the amazing things Redis can do.
As we all know, it's all about tradeoffs. If you are only using Redis as a cache, then does the performance improvement you get by using it out weight the complexity of another system dependency? Maybe? Depends...
Side note: If you are using Redis for caching and queue management, those are two separate considerations. Your cache and queues should never live on the same Redis instance because the should have different max-memory policies! </Side note>
The newest versions of Rails have really got me thinking about the simplicity of a PostgreSQL only deployment, then migrating to other data stores as needed down the line. I'd put the need to migrate squarely into the "good problems" to have because it indicates that your service is growing and expanding past the first few stages of growth.
All that being said, man I think Redis is sooooo cool. It's the hammer I am always for a nail to use on.
“well” is doing a lot of heavy lifting in your comment. Across a number of companies using Redis, I’ve never seen it used correctly. Adding it to the tech stack is always justified with hand waving about scalability.
well, redis is a bit of a junk bin of random barely related tools. It's just very likely that any project of non-trivial complexity will need at least some of them and I wouldn't necessarily advocate for trying jerry-rigging most of them in postgresql like the author of article, for example why would anyone want wasting their SQL DB server performance on KV lookups?
They may be its point, but I frankly didn't see much use in the wild. You might argue that then those systems didn't need Redis in the first place and I'd agree, but then note that that is the point tigerdata makes.
edit: it's not about serious uses, it's about typical uses, which are sad (and same with Kafka, Elastic, etc, etc)
All the time here in HN, I'm proud of it -- happy to have opinions not necessarily aligned with what users want to listen to. Also: never trust the establishment without thinking! ;D
I was one of the downvoters, and at the time I downvoted it, it was a very different comment. this is the original (copied from another tab that I hadn't refreshed yet):
> Tell me you don't understand Redis point is data structures without telling me you don't understand Redis point is data structures.
regardless of the author, I think slop of that sort belongs on reddit, not HN.
Something TFA doesn’t mention, but which I think is actually the most important distinction of all to be making here:
If you follow this advice naively, you might try to implement two or more of these other-kind-of-DB simulacra data models within the same Postgres instance.
And it’ll work, at first. Might even stay working if only one of the workloads ends up growing to a nontrivial size.
But at scale, these different-model workloads will likely contend with one-another, starving one-another of memory or disk-cache pages; or you’ll see an “always some little thing happening” workload causing a sibling “big once-in-a-while” workload to never be able to acquire table/index locks to do its job (or vice versa — the big workloads stalling the hot workloads); etc.
And even worse, you’ll be stuck when it comes to fixing this with instance-level tuning. You can only truly tune a given Postgres instance to behave well for one type-of-[scaled-]workload at a time. One workload-type might use fewer DB connections and depend for efficiency on them having a higher `work_mem` and `max_parallel_workers` each; while another workload-type might use many thousands of short-lived connections and depend on them having small `work_mem` so they’ll all fit.
But! The conclusion you should draw from being in this situation shouldn’t be “oh, so Postgres can’t handle these types of workloads.”
No; Postgres can handle each of these workloads just fine. It’s rather that your single monolithic do-everything Postgres instance, maybe won’t be able to handle this heterogeneous mix of workloads with very different resource and tuning requirements.
But that just means that you need more Postgres.
I.e., rather than adding a different type-of-component to your stack, you can just add another Postgres instance, tuned specifically to do that type of work.
Why do that, rather than adding a component explicitly for caching/key-values/documents/search/graphs/vectors/whatever?
Well, for all the reasons TFA outlines. This “Postgres tuned for X” instance will still be Postgres, and so you’ll still get all the advantages of being able to rely on a single query language, a single set of client libraries and tooling, a single coherent backup strategy, etc.
Where TFA’s “just use Postgres” in the sense of reusing your Postgres instance only scales if your DB is doing a bare minimum of that type of work, interpreting “just use Postgres” in the sense of adding a purpose-defined Postgres instance to your stack will scale nigh-on indefinitely. (To the point that, if you ever do end up needing what a purpose-built-for-that-workload datastore can give you, you’ll likely be swapping it out for an entire purpose-defined PG cluster by that point. And the effort will mostly serve the purpose of OpEx savings, rather than getting you anything cool.)
And, as a (really big) bonus of this approach, you only need to split PG this way where it matters, i.e. in production, at scale, at the point that the new workload-type is starting to cause problems/conflicts. Which means that, if you make your codebase(s) blind to where exactly these workloads live (e.g. by making them into separate DB connection pools configured by separate env-vars), then:
- in dev (and in CI, staging, etc), everything can default to happening on the one local PG instance. Which means bootstrapping a dev-env is just `brew install postgres`.
- and in prod, you don’t need to pre-build with new components just to serve your new need. No new Redis instance VM just to serve your so-far-tiny KV-storage needs. You start with your new workload-type sharing your “miscellaneous business layer” PG instance; and then, if and when it becomes a problem, you migrate it out.
HA is not about exceeding the limits of a server. Its about still serving traffic when that best server I bought goes offline (or has failed memory chip, or a disk or... ).
Postgres replication, even in synchronous mode, does not maintain its consistency guarantees during network partitions. It's not a CP system - I don't think it would actually pass a Jepsen test suite in a multi-node setup[1]. No amount of tooling can fix this without a consensus mechanism for transactions.
Same with MySQL and many other "traditional" databases. It tends to work out because these failures are rare and you can get pretty close with external leader election and fencing, but Postgres is NOT easy (likely impossible) to operate as a CP system according to the CAP theorem.
There are various attempts at fixing this (Yugabyte, Neon, Cockroach, TiDB, ...) which all come with various downsides.
Meh, it's 2026! Unless you're Google, you should probably just pipe all your data to /dev/null (way faster than postgres!) and then have a LLM infer the results of any get requests
Can I just say, I'm getting really sick of these LLM-generated posts clogging up this site?
GPTZero gives this a 95% chance of being entirely AI-generated. (5% human-AI mix, and 0% completely original.)
But I could tell you that just by using my eyes, the tells are so obvious. "The myth / The reality, etc."
If I wanted to know what ChatGPT had to say about something, I would ask ChatGPT. That's not what I come here for, and I think the same applies to most others.
Here's an idea for all you entrepreneur types: devise a privacy-preserving, local-running browser extension for scanning all content that a user encounters in their browser - and changing the browser extension icon to warn of an AI generated article or social media post or whatever. So that I do not have to waste a further second interacting with it. I would genuinely pay a hefty subscription fee for such a service at this point, provided it worked well.
"juSt use PoStGreS" is spoken like a true C-level with no experience on the ground with postgres itself or its spin offs.
yes pg is awesome and it’s my go to for relational databases. But the reason why mongo or influx db exists is because they excel in those areas.
I would use pg for timeseries for small use cases, testing. But scaling pg for production time series workloads is not worth it. You end up fighting the technology to get it to work just because some lame person wanted to simplify ops
I have problem pulled out postgres 10 or more times for various projects at work. Each time I had to fight for it, each time I won, it did absolute everything I needed it to do and did it well.
This is just AI slop. The best tell is how much AI loves tables. Look at "The Hidden Costs Add Up", where it literally just repeats "1" in the second column and "7" in the third column. No human would ever write a table like that.
I like PostgreSQL. If I am storing relational data I use it.
But for non relational data, I prefer something simpler depending on what the requirements are.
Commenters here are talking "modern tools" and complex systems. But I am thinking of common simpler cases where I have seen so many people reach for a relational database from habit.
For large data sets there are plenty of key/value stores to choose from, for small (less than a mega byte) data then a CSV file will often work best. Scanning is quicker than indexing for surprisingly large data sets.
Only if DuckDB is an acceptable value of PostgreSQL. I agree that PostgreSQL has eaten many DB use-cases, but the breathless hype is becoming exhausting.
Look. In a PostgreSQL extension, I can't:
1. extend the SQL language with ergonomic syntax for my use-case,
2. teach the query planner to understand execution strategies that can't be made to look PostgreSQL's tuple-and-index execution model, or
3. extend the type system to plumb new kinds of metadata through the whole query and storage system via some extensible IR.
(Plus, embedded PostgreSQL still isn't a first class thing.)
Until PostgreSQL's extension mechanism is powerful enough for me to literally implement DuckDB as an extension, PostgreSQL is not a panacea. It's a good system, but nowhere near universal.
Now, once I can do DuckDB (including its language extensions) in PostgreSQL, and once I can use the thing as a library, let's talk.
(pg_duckdb doesn't count. It's a switch, not a unified engine.)
This is the future of all devtools in the AI era. There's no reason for tool innovation because we'll just use whatever AIs know best which will always be the most common thing in their training data. It's a self-reinforcing loop. The most common languages, tools, libraries of today are what we will be stuck with for the foreseeable future.
Lots of familiar things here except for this UNLOGGED table as a cache thing. That's totally new to me. Has someone benched this approach against memcached and redis ? I'm extremely skeptical PGs query / protocol overheads are going to be competitive with memcached, but I'm making this up and have nothing to back it up.
They don't compare exactly. Author is mistaken in thinking that UNLOGGED means "in memory". It means "no WAL", so there's considerable speed up there, but traded in with also more volatility. To be a viable alternative to Redis or Memcached though, the savings you get from the latter two must really be superfluous to your use case. Which could be true for many (most?).
Its not only about performance, Redis data structures offer an even more advanced caching and data processing. I even use Redis as a cache for ClickHouse.
Nice! How do you "preinstall the extensions" so that you can have eg timescaledb and others available to install in your Postgres? Do you need to install some binaries first?
Good stuff, I turned my gist into an info site and searchable directory (and referenced this article as well, which seems to pay homage to my gist, which in turn inspired the site)
It's 20xx just use sqlite. Almost no-one needs all that power; they sure do think they do, but really don't. And will never. SQLite + Duck is all you need even with a million visitors; when you need failover and scaling you need more, but that is a tiny fraction of all companies.
Having built production apps on SQLite I can say it's not all sunshine and roses, the complexity explodes the moment you need multiple workers that can all write.
You better hope you dont have any big indexes or your writes will queue and you start getting "database is locked" errors. Good luck queueing writes properly at the application layer without rebuilding a full distributed queue system / network interface / query retry system on top of SQLite.
I really really really wish SQLite or PGLite supported multiple writers, especially if the queries are not even touching the same tables.
Oh I didn't say that, but nor is postgres. Use the right tool for the job I agree with, just people are making their lives difficult (expensive as well) by just blindly picking postgres.
It's not about "power" (what does that even mean?), it's about totally different design choices that are intended for different purposes. It's an architecture question, not a "power" or "visitors" question.
We run plenty of money making SaaS on sqlite without issues. And have been for over a decade. By power I meant all the complex features postgres has. But yes, it's an architecture question; my point being, most people pick many bulldozers while they need a 1 shovel.
I recently started digging into databases for the first time since college, and from a novice's perspective, postgres is absolutely magical. You can throw in 10M+ rows across twenty columns, spread over five tables, add some indices, and get sub-100ms queries for virtually anything you want. If something doesn't work, you just ask it for an analysis and immediately know what index to add or how to fix your query. It blows my mind. Modern databases are miracles.
I don't mean this as a knock on you, but your comment is a bit funny to me because it has very little to do with "modern" databases.
What you're describing would probably have been equally possible with Postgres from 20 years ago, running on an average desktop PC from 20 years ago. (Or maybe even with SQLite from 20 years ago, for that matter.)
Don't get me wrong, Postgres has gotten a lot better since 2006. But most of the improvements have been in terms of more advanced query functionality, or optimizations for those advanced queries, or administration/operational features (e.g. replication, backups, security).
The article actually points out a number of things only added after 2006, such as full-text search, JSONB, etc. Twenty years ago your full-text search option is just LIKE '%keyword%'. And it would be both slower than less effective than real full-text search. It clearly wasn’t “sub-100ms queries for virtually anything you want” like GP said.
And 20 years ago people were making the exact same kinds of comments and everyone had the same reaction: yeah, MySQL has been putting numbers up like that for a decade.
20 years ago was 2006? Oh no...
Don't get me started on when the 90's were.
Twenty years ago? You mean 1982, right? Right!?
"The peak of your human society" in the words of The Matrix
> Don't get me wrong, Postgres has gotten a lot better since 2006.
And hardware has gotten a lot better too. As TFA writes: it's 2026.
I am a DBA for Oracle databases, and XE can be used for free. It has the reference SQL/PSM implementation in PL/SQL. I know how to set up a physical standby, and otherwise I know how to run it.
That being said, Oracle Database SE2 is $17,500 per core pair on x86, and Enterprise is $47,500 per core pair. XE has hard limits on size and limits on active CPUs. XE also does not get patches; if there is a critical vulnerability, it might be years before an upgrade is released.
Nobody would deploy Oracle Database for new systems. You only use this for sunk costs.
Postgres itself has a manual that is 1,500 pages. There is a LOT to learn to run it well, comparable to Oracle.
For simple things, SQLite is fine. I use it as my secrecy manager.
Postgres requires a lot of reading to do the fancy things.
Postgres has a large manual not because it's overly complex to do simple things, but because it is one of the best documented and most well-written tools around, period. Every time I've had occasion to browse the manual in the last 20 years it's impressed me.
I read Jason Couchman's book for Oracle 8i certification, and passed the five exams.
They left much out, so many important things that I learned later, as I saw harmful things happening.
The very biggest thing is "nologging," the ability to commit certain transactions that are omitted from the recovery archived logs.
"You are destroying my standby database! Kyte is explicit that 'nologging' must never be used without the cooperation of the DBA! Why are you destroying the standby?"
It was SSIS, and they could never get it under control. ALTER SYSTEM FORCE LOGGING undid their ignorant presumption.
Everything you describe, relational databases have been doing for decades. It's not unique to Postgres.
My perspective might be equally naive as I've rarely had contact with databases in my professional life, but 100ms sounds like an absolutely mental timeframe (in a bad way)
what are you comparing this to btw?
"sub-100ms queries" is not a high bar to clear. Milliseconds isn't even the right measurement.
In a typical CRUD web app, any query that takes milliseconds instead of microseconds should be viewed with suspicion.
In a more charitable interpretation, maybe the parent is talking about sub-100ms total round trip time for an API call over the public internet.
But OP never said it's a CRUD app. Maybe OP did some experimentation with OLAP use cases.
I’m a huge Postgres fan. That said, I don’t agree with the blanket advice of “just use Postgres.” That stance often comes from folks who haven’t been exposed enough to (newer) purpose-built technologies and the tremendous value they can create
The argument, as in this blog, is that a single Postgres stack is simpler and reduces complexity. What’s often overlooked is the CAPEX and OPEX required to make Postgres work well for workloads it wasn’t designed for, at even reasonable scale. At Citus Data, we saw many customers with solid-sized teams of Postgres experts whose primary job was constant tuning, operating, and essentially babysitting the system to keep it performing at scale.
Side note, we’re seeing purpose-built technologies show up much earlier in a company’s lifecycle, likely accelerated by AI-driven use cases. At ClickHouse, many customers using Postgres replication are seed-stage companies that have grown extremely quickly. We pulled together some data on these trends here: https://clickhouse.com/blog/postgres-cdc-year-in-review-2025...
A better approach would be to embrace the integration of purpose-built technologies with Postgres, making it easier for users to get the best of both worlds, rather than making overgeneralized claims like “Postgres for everything” or “Just use Postgres.”
I took it to mean “make Postgres your default choice”, not “always use Postgres no matter what”
I personally see a difference between “just use Postgres” and “make Postgres your default choice.” The latter leaves room to evaluate alternatives when the workload calls for it, while the former does not. When that nuance gets lost, it can become misleading for teams that are hitting or even close to hitting—the limits of Postgres, who may continue tuning Postgres spending not only time but also significant $$. IMO a better world is one where developers can have a mindset of using best-in-class where needed. This is where embracing integrations with Postgres will be helpful!
I think that the key point being made by this crowd, of which I'm one, is somewhere in the middle. The way I mean it is "Make Postgres your default choice. Also *you* probably aren't doing anything special enough to warrant using something different".
In other words, there are people and situations where it makes sense to use something else. But most people believing they're in that category are wrong.
> Also you probably aren't doing anything special enough to warrant using something different".
I always get frustrated by this because it is never made clear where the transition occurs to where you are doing something special enough. It is always dismissed as, "well whatever it is you are doing, I am sure you don't need it"
Why is this assumption always made, especially on sites like HackerNews? There are a lot of us here that DO work with scales and workloads that require specialized things, and we want to be able to talk about our challenges and experiences, too. I don't think we need to isolate all the people who work at large scales to a completely separate forum; for one thing, a lot of us work on a variety of workloads, where some are big enough and particular enough to need a different technology, and some that should be in Postgres. I would love to be able to talk about how to make that decision, but it is always just "nope, you aren't big enough to need anything else"
I was not some super engineer who already knew everything when I started working on large enough data pipelines that I needed specialized software, with horizontal scaling requirements. Why can't we also talk about that here?
And another related one, you’ll know when you’ll need it.
No I don’t. I’ve never used the thing so I don’t know when it’ll come in useful.
The point is really that you can only evaluate which of alternatives is better once you have working product with data big enough - else it's just basically following trends and hoping your barely informed decision won't be wrong.
Postgres is widely used enough with enough engineering company blog posts that the vast majority of NotPostgres requests already have a blog post that either demonstrates that pg falls over at the scale that’s being planned for or it doesn’t.
If they don’t, the trade off for NotPostgres is such that it’s justifiable to force the engineer to run their own benchmarks before they are allowed to use NotPostgres
Agree to disagree here. I see a world where developers need to think about (reasonable) scale from day one, or at least very early. We’ve been seeing this play out at ClickHouse - the need for purpose-built OLAP is reducing from years to months. Also integration with ClickHouse is few weeks of effort for potentially significantly faster performance for analytics.
Reasonable scale means... what exactly?
Here's my opinion: just use postgres. If you're experienced enough to not when I say that, go for it, the advice isn't for you. If you aren't, I'm probably saving you from yourself. "Reasonable scale" to these people could mean dozens of inserts per second, which is why people talking vagueries around scale is madenning to me. If you aren't going to actually say what that means, you will lead people who don't know better down the wrong path.
I see a world where developers need to think about REASONABLE scale from day one, with all caps and no parentheses.
I've sat in on meetings about adding auth rate limiting, using Redis, to an on-premise electron client/Node.js server where the largest installation had 20 concurrent users and the largest foreseeable installation had a few thousand, in which every existing installation had an average server CPU utilisation of less than a percent.
Redis should not even be a possibility under those circumstances. It's a ridiculous suggestion based purely on rote whiteboard interview cramming. Stick a token_bucket table in Postgres.
I'm also not convinced that thinking about reasonable scale would lead to a different implementation for most other greenfield projects. The nice thing about shoving everything into Postgres is that you nearly always have a clear upgrade path, whereas using Redis right from the start might actually make the system less future-proof by complicating any eventual migration.
Ack, agreed. But there’s a better way to communicate than making blanket statements like “just use Postgres.” For example, you could say “Postgres is the default database,” etc.
Don’t get me wrong—I’m a huge huge fan of Postgres. I’ve worked at Postgres companies for a decade, started a Postgres company, and now lead a Postgres service within a company! But I prefer being real rather than doing marketing hype and blanket love.
This is my philosophy. When the engineer comes to me and says that they want to use NotPostgres, they have to justify why, with data and benchmarks, Postgres is not good enough. And that’s how it should be
> I don’t agree with the blanket advice of “just use Postgres.”
I take it as meaning use Postgres until there's a reason not to. ie build for the scale / growth rate you have not "how will this handle the 100 million users I dream of." A simpler tech stack will be simpler to iterate on.
Postgres on modern hardware can likely service 100 million users unless you are doing something data intensive with them.
You can get a few hundred TB of flash in one box these days. You need to average over 1 MB of database data per user to get over 100 TB with only 100 million users. Even then, you can mostly just shard your DB.
What about throughput? How many times can postgres commit per second on NVMe flash?
You can do about 100k commits per second, but this also partly depends on the CPU you attach to it. It also varies with how complicated the queries are.
With 100 million DAU, you're often going to have problems with this rate unless you batch your commits. With 100 million user accounts (or MAU), you may be fine.
That should be enough for most apps.
Yes. That's a good framing. PostgreSQL is a good default for online LOB-y things. There are all sorts of reasons to use something other than PostgreSQL, but raw performance at scale becomes such a reason later than you think.
Cloud providers will rent you enormous beasts of machines that, while expensive, will remain cheaper than rewriting for a migration for a long time.
In my experience the functionality of “purpose built systems” is found in Postgres but you have to read the manual.
I personally think reading manuals and tuning is a comparably low risk form of software development.
Postgres is infinitely extensible, more than MariaDB. But it's very painful to write or configure extensions and you might as well use something different instead of reaching for an extension mechanism.
Exactly. Use cases differ. https://www.geeksforgeeks.org/mysql/difference-between-mysql...
That article was clearly written by AI, based on data from 20 years ago.
> At Citus Data, we saw many customers with solid-sized teams of Postgres experts whose primary job was constant tuning, operating, and essentially babysitting the system to keep it performing at scale.
Oh no, not a company hiring a team of specialist in a core technology you need! What next, paying them a good wage? C'mon, it's so much better to get a bunch of random, excuse me, "specialized" SaaS tools that will _surely_ not lead to requiring five teams of specialists in random technologies that will eventually be discontinued once Google acquires the company running them.
OK but seriously, yeah sometimes "specialized" is good, though much less rarely than people pretend it to be. Having specialists ain't bad, and I'd say is better than telling a random developer to become a specialist in some cloud tech and pretending you didn't just end up turning a - hopefully decent - developer into a poor DBA. Not to mention that a small team of Postgres specialists can maintain a truly stupendous amount of Postgres.
At my company I saw a team of devs pay for a special purpose "query optimized" database with "exabyte capability" to handle... their totally ordinary HR data.
I queried said database... it was slow.
I looked to see what indexes they had set up... there were none.
That team should have just used postgres and spent all the time and money they poured into this fancy database tech on finding someone who knew even a little bit about database design to help them.
I hate how developers are often very skeptical but all the skepticism goes out the window if the tech is sufficiently hyped up.
And TBH, developers are pretty dumb not to realize that the tech tools monoculture is a way for business folks to make us easily replaceable... If all companies use the same tech, it turns us into exchangeable commodities which can easily be replaced and sourced across different organizations.
Look at the typical React dev. They have zero leverage and can be replaced by vibe coding kiddies straight out of school or sourced from literally any company on earth. And there are some real negatives to using silver bullet tools. They're not even the best tools for a lot of cases! The React dev is a commodity and they let it happen to them. Outsmarted by dumb business folks who dropped out of college. They probably didn't even come up with the idea; the devs did. Be smarter people. They're going to be harvesting you like Cavendish.
Sure, but the world is vast. I would love to be able to test every UI framework and figure out which is the best, but who’s got time for that? You have to rely on heuristics for some things, and popularity is often a decent indicator.
Popularity’s flip side is that it can fuel commodification.
I argue popularity is insufficient signal. React as tech is fine, but the market of devs who it is aimed at may not be the most discerning when it comes to quality.
I do agree, I don’t know why more people don’t just use Postgres. If I’m doing data exploration with lots of data (e.g., GIS, nD vectors), I’ll just spin up a Postgres.app on my macOS laptop, install what little I need, and it just works and is plenty fast for my needs. It’s a really great choice for a lot of domains.
That being said, while I think Postgres is “the right tool for the job” in many cases, sometimes you just want (relative) simplicity, both in terms of complexity and deployment, and should use something like SQLite. I think it’s unwise to understate simplicity, and I use it to run a few medium-traffic servers (at least, medium traffic for the hardware I run it on).
> in many cases, sometimes you just want (relative) simplicity, both in terms of complexity and deployment, and should use something like SQLite.
So many times when trying to just go for simplicity with SQLite it takes me like one working day until I run up against enough annoyances to where resolving those is more work than setting up the "set up and forget" postgres instance.
Granted, this is for personal stuff... but "Postgres packaged for low maintenance" is present in a lot of OS package managers! Even for smaller data analysis work SQLite perf leads _loads_ to be desired (once had QGIS struggling with a sqlite DB... pg made everything mostly instant. Indices etc... but stuff I _couldn't easily get with sqlite_)
If SQLite works for you that's great, I do think it's worth it for people to _try_ to do simple pg setups to understand just how painful it is to use pg (for me: not that high)
Oh, yes, that's one of my points! I think Postgres is a great way to deal with tons of data, and it's really the only thing I use to do any sort of analysis or informatics (that and Parquet + Dask).
I am also a fan of SQLite. One of the best parts during development is how easy it is to spin up and spin down databases for full integration tests without containers or anything. It also simplifies backups, and is probably good enough.
These days I would recommend PGlite for testing purposes when you use Postgres in production. That way you don't need any specific SGQLite vs Postgres behavior switches.
Where I have used SQLite most successfully is really two use cases. First, I use it for data processing. Say I need to retrieve lots of data and transform it to a different setup. I could do that in something like Python but SQL is just more expressive for that and I can also create a new database, populate it with data I have, fetch new data, combine it together, export the update to a permanent data store (usually Postgres).
Second, when I need a local save file. Sometimes small local apps are better served by a save file and they save file might as well have an extensible format that I can update as I go. This is more rare but still can be useful.
The first use case is very powerful. A temporary SQL database that can be blown away with zero trace of it is great. And the ability to run complex queries on it can really help.
But 99% of the time I just use Postgres. It works, it has sane defaults, it is crazy extensible, and it has never not met my needs, unlike Oracle or MySQL.
DuckDB via Python is my go-to for that first use case. It’s easier than ever to use Python and SQL together with Marimo notebooks.
``` uv run --with marimo marimo run --sandbox ```
and you’re ready to go.
i think the topic of "what data backend" gets super conflated into many different variations of what the hell people need it for. discussions here go so many different directions. some people are building simple webapps, some are building complex webapps that need to scale for a gazillion users, some are building local apps, some are just tinkering, some are thinking broadly about how their backend needs to sync with a datalake->some data warehouse at an org, yadda yadda ya.
i personally like postgres myself for just about all use cases that must be shared with others (app with more than one client that might be providing CRUD updates or anything really that demands a central data store). ive used sqlite a couple times with WAL to try and make a small app shared between 2-3 people who all would contribute updates thru it but it wasnt ideal. for postgres so many features/extensions its concurrent writes are fast as hell and if you just want to one-shot a solution then you cant go wrong, but it's ofc not the same as sqlite setup.
i think a lot of the pain with postgres is just learning to effectively be a knowledgeable db admin of sorts. its somewhere between being a competent devops guy and a dbadmin expert all in one. if you're actually doing some kind of production deployment it is kinda scary hoping you've got everything set up right. even supabase which makes this whole process trivial to get going requires an understanding of not-always-understood security premises that just make things spooky.
lot of words to say i dont get much out of these discussions tbh. theres just too many use cases and variables in everyones working/hobby lives im not sure that there is a proverbial bottom to any of it. some will use sqlite and some will use postgres and some will use some weird thing no ones heard of because they're afraid to rawdog sql and just want immediate graphql capability to be the main mode of data retrieval. some will show up here and talk about why you need redis in the middle.
its too much noise so i just keep using postgres because its free and boring and fast. end of the day i just want to make stuff people can use. it's a hard endeavor to do well alone, if you dont have a team of other experts who can help you put all the puzzle pieces together on how to deploy things the right way and also add pieces like redis or whatever... it's just a lot. it's hard to find where to get started. sqlite is the only solution that really by nature of what it is seems to champion the lonely developer, but the tradeoffs are big if you're trying to make something that should get used by many people.
A bit off topic but the one thing I've never been able to figure out with Postgres easily & reliably is what magic incantations allow a user account full access to a specific database but not to others, particularly in cases of managed postgres offered by cloud providers. `GRANT ALL PRIVILEGES` never seems to work.
Having to look up and spend time fixing permissions every time itself makes using Postgres for simple uses difficult for me but if you're using it ad hoc, any tips?
Isn't it something like GRANT ALL ON DATABASE foo TO USER bar
Grant operates on objects that already exist. They probably want ALTER DEFAULT PRIVILEGES or maybe just a superuser. The Postgres docs are actually really really good. https://www.postgresql.org/docs/current/sql-alterdefaultpriv... https://www.postgresql.org/docs/current/role-attributes.html
I ran into this once... I think there's something about the grant not working on new objects or being one level too low? I tended to solve those problems by granting ownership of the db itself.
99% of the time I've used Postgres it has been one user and one database. The one time I needed to create and configure a separate user with different permissions I remember it being thoroughly confusing and I think the DBA ended up doing it.
One of the best features of Postgres is the documentation. I recommend starting there.
Just let Claude fuck it up for you and learn from its mistakes.
I wish PostgreSQL had a native vector implementation instead of using extensions. They're kind of a pain in the ass to maintain, especially with migrations.
Interestingly almost all of postgres is an extension including the stuff you expect to be built in. All data types, all index types, all operators, and the implementation of ordinary tables I think
For me the showstopper missing feature is a standard and native implementation of temporal tables. Once you use those effectively in an application, it become something you can't do without.
Because it's not web scale, mongoDB is web scale
You turn it on, and it scales right up.
Who cares what we store so long as we do it quickly?
I hear there's a great tool for this, DB null or something like that.
MySQL actually has a BLACKHOLE storage engine designed specifically for universe-scale data storage for those who don't care about persistence.
It's an older meme, but it checks out.
WiredTiger would like to have a word with you. It was made default in 2015 and fixed a broad class of issues.
See also, PGLite: https://pglite.dev/
And Turso: https://turso.tech/
No multi-writer support.
Standardizing on one single tiny little project is always a bad idea. Why? Some examples (which are admittedly not related to postgres, because I don't know their structure):
1) A single person, doing a ton of heavy lifting, leaves, or worse, turns it over, or silently sells out to a nefarious person.
2) A severe security vulnerability is found. If everyone is using postgres, everyone is vulnerable. Bonus points if the vulnerability is either not publicly disclosed or it is hard to fix.
3) Commercial/Government interests heavily influence and push the project into places that could make it vulnerable in any given way. This is absolutely a thing.
4) AI. No clarification here. Just use your imagination, with recent news regarding FFMPEG and other projects in mind.
I'm not sure I would call either PostgreSQL or SQLite "tiny."
Op calling the de jure database solution (pg) in the world “tiny” is pretty laughable. It’s one of the most popular solutions for databases in general and RDBMS specifically. SQLite is also massive in terms of its adoption and use
No, seriously, people need to be punished for submitting LLM-generated garbage without specifying that it's LLM-generated garbage. 400+ points, oh my god, people, what's wrong with you...
We buried the post for seeming obviously-LLM-generated. But please email us about these (hn@ycombinator.com) rather than posting public accusations.
There are two reasons why emailing us is better:
First, the negative consequences of a false allegation outweigh the benefits of a valid accusation.
Second and more important: we'll likely see an email sooner than we'll see a comment, so we can nip it in the bud quickly, rather leaving it sitting on the front page for hours.
Not sure if I should mail this question but, is there any chance those 400+ votes are artificially inflated?
There’s no evidence of this, but a title that’s easy to agree with can often attract upvotes from people who don’t read the article.
You buried a popular post because of the public accusation or just your "hunch"?
Why not let your audience decide what it wants to read?
I say this as a long time HN reader, who feels like the community has become grumpier over the years. Which I feel like is a shame. But maybe that's just me.
You're welcome to email us about this.
It's my job to read HN posts and comments all day, every day, and these days that means spending a lot of time evaluating whether a post seems LLM-generated. In this case the post seems LLM-generated or heavily LLM-edited.
We have been asking the community not to publicly call out posts for being LLM-generated, for the reasons I explained in the latest edit of the comment you replied to. But if we're going to ask the community that, we also need to ask submitters to not post obviously-LLM-influenced articles. We've been asking that ever since LLMs became commonplace.
> I say this as a long time HN reader, who feels like the community has become grumpier over the years. Which I feel like is a shame. But maybe that's just me.
We've recently added this line to the guidelines: Don't be curmudgeonly. Thoughtful criticism is fine, but please don't be rigidly or generically negative.
HN has become grumpier, and we don't like that. But a lot of it is in reaction to the HN audience being disappointed at a lot of what modern tech companies are serving up, both in terms of products and content, and it doesn't work for us to tell them they're wrong to feel that way. We can try, but we can't force anyone to feel differently. It's just as much up to product creators and content creators to keep working to raise the standards of what they offer the audience.
Thanks Tom, I appreciate the openness. You are seemingly overriding the wishes of the community, but it your community and you have the right to do so. I still think it's a shame, but that's my problem.
> You are seemingly overriding the wishes of the community
That's false. The overwhelming sentiment of the community is that HN should be free of LLM-generated content or content that has obvious AI fingerprints. Sometimes people don't immediately realize that an article or comment has a heavy LLM influence, but once they realize it does, they expect us to act (this is especially true if they didn't realize it initially, as they feel deceived). This is clear from the comments and emails we get about this topic.
If you can publish a new version of the post that is human-authored, we'd happily re-up it.
I'm just sharing my thoughts as a long-time reader. Again, it's your show. You don't have to defend your actions. Thanks for all that you do.
I’d be grumpy over wasting my time on an HN post that’s LLM generated which doesn’t state that it is. If I wanted this, I could be prompting N number of chat models available to me instead of meandering over here.
There are also 200+ comments on here and a good discussion IMO which is now unfortunately buried.
Feels like a net negative for the HN community.
They’re upvoting because they agree with the sentiment in the title.
That’s largely how these voting sites work.
Exposing the age-old truth of "commenters and voters don't read articles" I see
I just pasted the first paragraph in an "AI detector" app and it indeed came back as 100% AI. But I heard those things are unreliable. How did you determine this was LLM-generated? The same way?
Apart from the style of the prose, which is my subjective evaluation: This blog post is "a view from nowhere." Tiger Data is a company that sells postgres in some way (don't know, but it doesn't matter for the following): they could speak as themselves, and compare themselves to companies that sell other open source databases. Or they could showcase benchmarks _they ran_.
Them saying: "What you get: pgvectorscale uses the DiskANN algorithm (from Microsoft Research), achieving 28x lower p95 latency and 16x higher throughput than Pinecone at 99% recall" is marketing unless they give how you'd replicate those numbers.
Point being: this could have been written by an LLM, because it doesn't represent any work-done by Tiger Data.
For what it's worth, TigerData is the company that develops TimescaleDB, a very popular and performant time series database provided as a Postgres extension. I'm surprised that the fact that TigerData is behind it is not mentioned anywhere in the blog post. (Though, TimescaleDB is mentioned 14 times on the page).
I don't understand your example: pgvectorscale was built and is maintained by Tiger Data
Just using LLMs enough I've developed a sense for the flavor of writing. Surely it could be hidden with enough work, but most of the time it's pretty blatant.
Sometimes I get an "uncanny valley" vibe when reading AI-generated text. It can be pretty unnerving.
It’s got that LLM flow to it. Also liberal use of formatting. It’s like it cannot possibly emphasize enough. Tries to make every word hit as hard as possible. Theres no filler, nothing slightly tangential or off topic to add color. Just many vapid points rapid fire, as if they’re the hardest hitting truth of the decade lol
ChatGPT has a pretty obvious writing style at the moment. It's not a question of some nebulous "AI detector" gotcha, it's more just basic human pattern matching. The abundant bullet points, copious bold text, pithy one line summarizing assertions ("In the AI era, simplicity isn’t just elegant. It’s essential."). There are so many more in just how it structures its writing (eg "Let’s address this head-on."). Hard to enumerate everything, frankly.
Was it leading with a bad analogy that gave it way?
I know everybody just wants to talk about Postgres but it’s still sad to see any sort of engagement with slop. Even though the actual article is essentially irrelevant lol
and it’s up voted 400+ on hn. This place has truly lost its way.
I mean to be fair, "Just use Postgres" will get 400 votes here without people even clicking TFA.
This kind of thing gets posted every couple of months. Databases like Pinecone and Redis are more cost-effective and capable for their special use case, often dramatically so. In some circumstances the situation favours solving the problem in Postgres rather than adding a database. But that should be evaluated on a case-by-case basis. For example, if you run something at scale and have an ops team the penalty of adding a second database is much smaller.
(I run a medium-sized Postgres deployment and like it, but I don't feel like it's a cost-effective solution to every database problem.)
Once you have an app, a lot of data, and actual problems, it's far easier to pick the right alternative.
PostgreSQL is good enough to get to medium sized with nearly every use case. Once you are there, you have the use case and the test data to test any alternative for it well, rather than trying to guess beforehand what you actually need.
The advice is basically "PostgreSQL is probably good enough for whatever you're building now, and you should only look for other solution once you are big enough that it stops being that"
Could make the same argument for SQLite, the threshold is lower, but similarly you can pretty far with it. Then decide what's next, once you're out growing it.
I've actually started moving away from Postgres to MySQL and SQLite. I don't want to have to deal with the vacuums/maintenance/footguns.
MySQL is definitely easier to use if you don’t want to ever have to think about DB maintenance; on the other hand, you’re giving up a TON of features that could make your queries enormously performant if your schema is designed around them - like BRIN indices, partial indices, way better partition management, etc.
OTOH, if and only if you design your schema to exploit MySQL’s clustering index (like for 1:M, make the PK of the child table something like (FK, some_id)), your range scans will become incredibly fast. But practically no one does that.
As someone who learned to think in MySQL, this is really true, at the time Postgres was a viable alternative too, only the tooling to get started reached me a little easier and quicker.
The major thing I advocate for is don't pick a NOSQL database to avoid relational dbs, only to try and do a bunch of relational work in NOSQL that would have been trivial in an RBDMS. Postgres can even power graph query results which is great.
Major change for us to replace postgresql was replication and HA across geographies - i think on the Postgres side greenplum and cockroachdb are/were an option.
With MySQl variants like percona xtradb setup can go from 1 instance to cluster to geo replicating cluster with minimal effort.
While vanilla postges for an equivalent setup is basically pulling teeth.
I have a lot of respect for Postgres' massive feature set, and how easy it is to immediately put to use, but I don't care for the care and feeding of it, especially dealing with upgrades, vacuuming, and maintaining replication chains.
Once upon a time, logical replication wasn't a thing, and upgrading major versions was a nightmare, as all databases in the chain had to be on the same major version. Upgrading big databases took days because you had to dump and restore. The MVCC bloat and VACCUM problem was such a pain in the ass, whereas with MySQL I rarely had any problems with InnoDB purge threads not able to keep up with garbage collecting historical row versions.
Lots of these problems are mitigated now, but the scars still sometimes itch.
I’ve thought of doing this myself. Upgrades also seem easier in MySQL. That said, it does seem to be stagnating relative to Postgres.
I agree that SQLite requires less maintenance, but you still need to vacuum to prevent the database file from accumulating space (for apps, I run VACUUM at startup).
SQLite vacuum is only needed to shrink the database after you remove a lot of data. It's not needed in routine operations like postgres does. Postgres has autovacuum usually on by default so I'm not understanding the complaint much
SQLite is awesome, no services to run, ports to open or anything like that, just a nice little database to use as you need.
It’s kind of remarkable how little operational maintenance mysql requires. I’ve been a Postgres fan for a long time, but after working with a giant mysql cluster recently I am impressed. Postgres requires constant babysitting, vacuums, reindexing, various sorcery. MySQL just… works.
There's a fascinating gap between PostgreSQL theory and practice here. Elsewhere in this thread, I complained that PostgreSQL extensions can't do everything yet. One thing they can do, however, or ought to be able to do, is provide alternative storage engines. That's the central thing they're supposed to be especially good at providing.
So what did the VACUUM-free, undo-based MVCC storage engine project stall? https://wiki.postgresql.org/wiki/Zheap
Why is there no InnoDB for PostgreSQL?
(Maybe OrioleDB will avoid a similar fate.)
MySQL is Oracle, you want MariaDB
I'm in the same boat, the idiosyncrasies of postgres are real; mysql / sqlite are far more predictable.
If you think Postgres has idiosyncrasies…
https://sqlite.org/quirks.html
Which one is an issue?
No types, no foreign key enforcement, double–quoted string literals
I really wish I could but it's hard to embed in local-first apps and packages without forcing users to set up docker.
PGlite would be perfect if only it allowed multiple writer connections. SQLite is ok but I want PG extensions and I want true parallel multi-writer support!
Caching is mentioned in the article: What do you guys feel about using PostgreSQL for caching instead of Redis?
Redis is many times faster, so much that it doesn't seem comparable to me.
A lot of data you can get away with just caching in-mem on each node, but when you have many nodes there are valid cases where you really want that distributed cache.
Prove that you need the extra speed.
Run benchmarks that show that, for your application under your expected best-case loads, using Redis for caching instead of PostgreSQL provides a meaningful improvement.
If it doesn't provide a meaningful improvement, stick with PostgreSQL.
This is the proper approach when deciding whether to use any type of tool or technology. Is the increased amount of cognitive overhead for someone with minimal exposure to your system (who will have to maintain it when you’ve moved on) worth the increased performance on a dollars-per-hour basis? If so, it may be a good option. If not, it doesn’t matter how much better the relative performance is.
Yep. And often, if you do find a need for a cache, you can get away with an in-app cache before moving to something like Redis.
Neither.
Just use memcache for query cache if you have to. And only if you have to, because invalidation is hard. It's cheap, reliable, mature, fast, scalable, requires little understanding, has decent quality clients in most languages, is not stateful and available off the shelf in most cloud providers and works in-clusetr in kubernetes if you want to do it that way.
I can't find a use case for Redis that postgres or postgres+memcache isn't a simpler and/or superior solution.
Just to give you an idea how good memcache is, I think we had 9 billion requests across half a dozen nodes over a few years without a single process restart.
is there anything memcache gives you that a redis instance configured with an eviction policy of allkeys-lru doesn't give you
memcached is multithreaded, so it scales up better per node.
memcached clients also frequently uses ketama consistent hashing, so it is much easier to do load/clustering, being much simpler than redis clustering (sentinel, etc).
Mcrouter[1] is also great for scaling memcached.
dragonfly, garnet, and pogocache are other alternatives too.
[1]: https://github.com/facebook/mcrouter
Both Redis (finally) and Valkey addressed the multithreading scalability issues, see https://oneuptime.com/blog/post/2026-01-21-redis-vs-memcache... and/or https://news.ycombinator.com/item?id=43860273...
I imagine the answer here is: less complexity.
If you want to compare Redis and PostgreSQL as a cache, be sure to measure an unlogged table, as suggested in the article. Much of the slowness of PostgreSQL is to ensure durability and consistency after a crash. If that isn't a concern, disable it. Unlogged tables are automatically truncated after a crash.
Depends on your app cache needs. If it's moderate, I'd start with postgres...ie. not have operate another piece of infra and the extra code. If you are doing the shared-nothing app server approach (rails, django) where the app server remembers nothing after each request Redis can be a handy choice. I often go with having a fat long lived server process (jvm) where it also acts for my live caching needs. #tradeoffs
I say do it, if it simplifies the architecture. For example if you are using firestore with a redis cache layer, that's 2 dbs. If you can replace 2 dbs with 1 db (postgres), I think it's worth it. But if you are suggesting using a postgres cache layer in front of firestore instead of redis... to me that's not as clear cut.
Materialized views work pretty well in Postgres. But yes at some level of load it’s just helpful to have the traffic shared elsewhere.
But As soon as you go outside Postgres you cannot guarantee consistent reads within a transaction.
That’s usually ok, but it’s a good enough reason to keep it in until you absolutely need to.
Materialised views don't work well in postgres. They don't incrementally update and have to be manually triggered to rebuild from scratch
Sure, I guess it depends on the use case.
Depends how much you have to cache and how much speed you really need from it.
I like Redis a lot, but for things in the start I'm not sure the juice is always worth the squeeze to get it setup and manage another item in teh stack.
Luckily, search is something that has been thought about and worked on for a while and there's lots of ways to slice it initially.
I'm probably a bit biased though from past experiences from seeing so many different search engines shimmed beside or into a database that there's often an easier way in the start than adding more to the stack.
Skeptical about replacing Redis with a table serialized to disk. The point of Redis is that it is in memory and you can smash it with hot path queries while taking a lot of load off the backing DB. Also that design requires a cron which means the table could fill disk between key purges.
From the article, using UNLOGGED tables puts them in memory, not on disk
I the article is wrong. UNLOGGED means it isn't written to WAL which means recovery and rollback guarantees won't work since the transaction can finish before the page can be synchronized on disk. The table loses integrity as a trade off for a faster write.
https://www.postgresql.org/docs/current/sql-createtable.html...
It really depends on your use case, doesn't it? I'd say, just use Postgres... until you have a reason not to. We used finally switched to Elasticsearch to power user search queries of our vehicle listings a few years ago, and found its speed, capabilities, and simplicity all significant improvements compared to the MariaDB-based search we'd been using previously. (Postgres's search features are likely better than MariaDB's, but I expect the comparison holds.) But that's the core of our product, and while not giant, our scale is significant. If you're just doing some basic search, you don't need it. (We managed for many years just fine without.)
I've never really regretted waiting to move to a new tool, if we already had something that works. Usually by doing so you can wait for the fads to die down and for something to become the de facto standard, which tends to save a lot of time and effort. But sometimes you can in fact get value out of a specialized tool, and then you might as well use it.
Huh, apparently this is controversial, based on the score ping-ponging up and down! I'm not really sure why though. Is it because of the reference to MariaDB?
Now we only need easy self-hosted Postgres clustering for HA. Postgres seems to need additional tooling. There is Patroni, which doesn't provide container images. There is Spilo, which provides Postgres images with Patroni, but they are not really maintained. There is a timescaledb-ha image with Patroni, but no documentation how to use it. It seems the only easy way for hosting a Postgres cluster is to use CloudNativePG, but that requires k8s.
It would be awesome to have easy clustering directly built-in. Similar to MongoDB, where you tell the primary instance to use a replica set, then simply connect two secondaries to primary, done.
Postgres is not a CP database, and even with synchronous replication, it can lose writes during network partitions. It would not pass the Jepsen test suite.
This is very hard to fix and requires significant architectural changes (like Yugabyte or Neon have done).
Blog posts, like academic papers, should have to divulge how AI has been used to write them.
Even blog post is generous. This is an ad.
Yes this is clearly verbatim output from an LLM.
But it's perfect HN bait, really. The title is spicy enough that folks will comment without reading the article (more so than usual), and so it survives a bit longer before being flagged as slop.
Is HN guidelines to flag AI content? I am unsure of how flagging for this is supposed to work on HN and have only ever used the flag feature for obvious spam or scams.
It might be wrong, but I have started flagging this shit daily. Garbage articles that waste my time as a person who comes on here to find good articles.
I understand that reading the title and probably skimming the article makes it a good jumping off point for a comment thread. I do like the HN comments but I don't want it to be just some forum of curious tech folks, I want it to be a place I find interesting content too.
I agree. It seems this is kind of a shelling point right now on HN and there isn't a clear guideline yet. I think your usage of flagging makes sense. Thanks
You're absolutely right! Let's delve into why blog posts like this highlight the conflict between the speed and convenience of AI and authentic human expression. Because it's not just about fears of hallucination—it's about ensuring the author's voice gets heard. Clearly. Completely. Unmistakably.
If nothing else, I sure got amusement from this.
Granted it's often easy to tell on your own, but when I'm uncertain I use GPTZero's Chrome extension for this. Eventually I'll stop doing that and assume most of what I read outside of select trusted forums is genAI.
People used to write Medium/Linkedin slop by hand and they didn't have to disclose it. Slopping is its own punishment.
I'll take it one step further and say you should always ask yourself if the application or project even needs a beefy database like Postgres or if you can get by with using SQLite. For example, I've found a few self-hosted services that just overcomplicated their setup and deployment because they picked Postgres or MariaDB over SQLite, despite it being a much better self-contained solution.
I find that if I want to use JSON storage I'm somewhat stuck choosing my DB stack. If I want to use JSON, and change my database from SQLite to Postgres I have to substantially change my interface to the DB. If I use only SQLite, or only Postgres it's not so bad, but the transition cost to "efficient" JSON use in Postgres from a small demo in SQLite is kind of high compared to just starting with an extra docker run (for a Postgres server) / docker compose / k8s yaml / ... that has my code + a Postgres database.
I really like having some JSON storage because I don't know my schema up front all the time, and just shoving every possible piece of potentially useful metadata in there has (generally) not bit me, but not having that critical piece of metadata has been annoying (that field that should be NOT NULL is NULL because I can't populated it after the fact).
SQLite is great until you try to do any kind of multi-writer stuff. Theres no SELECT FOR UPDATE locking and no parallel write support, if any of your writes take more than a few ms you end up having to manage queueing at the application layer, which means you end up having to build your own concurrent-safe multi-writer queue anyway.
I've found that Postgres consumes (by default) more disk than, for example, MySQL. And the difference is quite significant. That means more money that I have to pay every month. But, sure Postgres seems like I system that integrates a lot of subsystems, that adds a lot of complexity too. I'm just marking the bad points because you mention the good points in the post. You're also trying to sell you service, which is good too.
The problem is that Postgres uses something like 24B overhead per row. That is not a issue with small Tables, but when your having a few billion around, each byte starts to add up fast. Then you a need link tables that explode that number even more, etc ... It really eats a ton of data.
At some point you end up with binary columns and custom encoded values, to save space by reducing row count. Kind of doing away with the benefits of a DB.
Yeah postgres and mariadb have some different design choices. I'd say use either one until it doesn't work for you. One of the differences is the large row header in postgres.
On flipside, restore from plain postgresql dump is much, much faster than plain mysql backup. There are alternative strategies for mysql but that's extra work
Some people do Postgres on compressed ZFS volumes to great success.
On average I get around 4x compression on PostgreSQL data with zstd-1
I am curious if you know anyone using Btrfs for this too. I like ZFS, but it Btrfs can do this it would be easier to use with some distros, etc. as it's supported in kernel.
I do it.
The big problem for me from running DB on Btrfs is that when I delete large dirs or files (100GB+), it locks disk system, and Db basically stop responding on any queries.
I am very surprised that FS which is considered prod grade having this issue..
Try XFS if you havn’t yet.
Very solid and no such issues.
I haven't used XFS in almost two decades, does it have compression support in the same way? Also, does it do JBOD stuff? I know it's a bit of a different thing, but I really enjoy the pool many disks together part of Btrfs, although it has its limitations.
XFS doesn't have inline compression, nor does it have volume management functionality. It's a nice filesystem (and it's very fast) but it's just a filesystem.
No compression.
> Over 48,000 companies use PostgreSQL, including Netflix, Spotify, Uber, Reddit, Instagram, and Discord.
I hope all of them donate to the PostgreSQL community in the same amount they benefit from it
https://www.postgresql.org/about/donate/
The real problem is, I'm so danged familiar with the MySQL toolset.
I've fixed absolutely terrifying replication issues, include a monster split brain where we had to hand pick off transactions and replay them against the new master. We've written a binlog parsing as an event source to clear application caching. I can talk to you about how locking works, when it doesn't (phantom locks anyone?), how events work (and will fail) and many other things I never set out to learn but just sort of had to.
While I'd love to "just use Postgres" I feel the tool you know is perhaps the better choice. From the fandom online, it's overall probably the better DBMS, but I would just be useless in a Postgres world right now. Sorta strapped my saddle to the wrong start unfortunately.
Start learning on the side. Know one well, the time to learn another is much shorter. Bet you could be well on your way in just a few weeks. Not to mention getting away from the Oracle stink.
It's 5th of feb 2026, and we already get our monthly "just use postgres" thread
btw, big fan of postgres :D
I am looking for a db that runs using existing json/yaml/csv files, saves data back to those files in a directory, which I can sync using Dropbox or whatever shared storage. Now I can run this db wherever I am & run the application. Postgres feels a bit more for my needs
It feels like you want SQLite?
Why? Why would separate json/yaml/csv files be better than just... syncing using postgres itself? You point `psql` to the host you need, because clearly you have internet access even on the go, and done: you don't need to sync anything, you already have remote access to the database?
I have internet doesn’t mean I have access to DB. For eg. I don’t want to open my DB for public access.
Love the sentiment! And I'm a user - but what about aggregations? Elasticsearch offers a ton of aggregates out of the box for "free" completely configurable by query string.
Tiger Data offers continuous aggs via hypertable but they need to be configured quite granularly and they're not super flexible. How are you all thinking about that when it comes to postgres and aggregations?
I love postgres and it really is a supertool. But to get exactly what you need can require digging deep and really having control over the lowest levels. My experience after using timescale/tigerdata for the last couple years is that I really just wish RDS supported the timescale extension; TigerData's layers on top of that have caused as many problems as they've solved.
I like "just use postgres" but postgres is getting a bit long in the tooth in some ways, so I'm pretty helpful that CedarDb sticks the landing.
https://cedardb.com/
I suspect it not being open source may prevent a certain level of proliferation unfortunately.
Gad, they sure like to say "BM25" over and over again. That's a near worthless approach to result ranking. Doing any halfway ok job requires much more tuned and/or more powerful approaches.
Can you please elaborate why?
It's common to do a hybrid of BM25 with other fuzzy search or pgvector.
BM25 is quite bad and needs to be retrained for each corpus anew. SPLADEv2 is much better and there are even better sparse embeddings these days.
the index bloat problem is still not solved.
Just use sqlite until you can’t.
Then use Postgres until you can’t.
I have two fundamental problems with Postgres - an excellent piece of technology, no questions about that.
First, to use Postgres for all those cases you have to learn various aspects of Postgres. Postgres isn't a unified tool which can do everything - instead it's a set of tools under the same umbrella. As a result, you don't save much from similarly learning all those different systems and using Postgres only as a RDBMS. And if something isn't implemented in Postgres better than in a 3rd party system, it could be easier to replace that 3rd party system - just one part of the system - rather than switching from Postgres-only to Postgres-and-then-some. In other words, Postgres has little benefits when many technologies are needed comparing with the collection of separate tools. The article notwithstanding.
Second, Postgres is written for HDDs - hard disk drives, with their patterns of data access and times. Today we usually work with SSDs, and we'd benefit from having SSD-native RDBMSes. They exist, and Postgres may lose to them - both in simplicity and performance - significantly enough.
Still, Postgres is pretty good, yes.
Never heard about postgresql being written for hdds. Could you provide a source?
Well, take a look at the dates of when Postgres was created and when SSDs become available. Better, find articles about internal algorithms, B-trees, times of operations like seeks etc. The Postgres was initially written with disk operation timings in mind, and the point is that's changing - and I haven't heard of Postgres architecture changing with that.
Can you share examples of new database architectures and products using them that are built for SSDs?
I'm sure we have different capabilities and constraints, but I am unaware of any fundamentally different approaches to indexes.
This really isn't true. You should use different parameters (specifically, you can reduce the random_page_cost to a little over 1) on a SSD but there isn't a really compelling reason to use a completely different DBMS for SSDs.
SSD-native RDBMS sounds good in theory! What's in mean in practice? What relational databases are simpler and more performant? Point me in their direction!
This post is discussing more specialized databases, but why would people choose Oracle/Microsoft DB instead of Postgres? Your own experience is welcome.
I'd pick MSSQL if I was being compensated based upon the accuracy, performance, durability and availability of the system. Also in any cases where the customer can pay and won't complain even once about how much this costs.
I'd never advocate for a new oracle install. But, I'd likely defend an existing one. I've seen how effective pl/sql can be in complex environments. Rewriting all that sql just because "oracle man bad" (or whatever) is a huge fucking ask of any rational business owner.
Easy answer here - nearly every LOB app we have uses MSSQL.
I've had engineers want to talk about syncing it to MySQL using some custom plumbing so that they can build a reporting infra around their MySQL stack, but it's just another layer of complexity over having code just use Microsoft's reporting services.
I'll add, having finance people with Excel really like being able to pull data directly from MSSQL, they do not like hearing about a technican's python app.
Elixir + Postgres is the microservices killer...last time I saw VP try to convince a company with this stack to go microservices he was out in less than 6mo
This is the killer combo. Working on something now that uses pgmq + Elixir for DAG workflows: https://github.com/agoodway/pgflow
I prefer bit more type-safety. JVM/Kotlin/Jdbi + PG for me.
Postgres can definitely handle a lot of use cases; background job scheduling always had me tempted to reach for something like rabbitmq but so far happy enough with riverqueue[0] for Go projects.
[0]: https://riverqueue.com/
I made the switch from MySQL to postgres a few years ago I didn't really understand what everyone was excited about before I made the switch. I haven't used MySQL since and I think postgres provides everything I need the only thing that I ever snarl at is how many dials and knobs and options there are that's not a bad thing!
> the only thing that I ever snarl at is how many dials and knobs and options there are that's not a bad thing!
yea this is me. postgres is actually insane for how much is offered at checks notes free.99.
_however_ we are probably due for like. I don't know a happy configurator type tool that has reasonable presets and a nice user friendly config tool that helps people get going without sidequesting for a year on devops/dbadmin expertise. that isn't even a favored outcome imo, you just get pretty lukewarm postgres-deployers who are probably missing a bunch of important settings/flags. my team mates would probably shit themselves in the face of postgres configs currently, they are absolute rats in the code but good and proper deployment of postgres is just a whole other career-arc they haven't journeyed and a _lot_ of organizations don't always have a devops/dbadmin type guy readily available any time you want to scrap together an app who's just going to wait for your signal to deploy for you. or said devops/dbadmin guy is just.. one guy and he's supporting 500 other things. not saying the absence/failing to scale teams with such personnel is right, it's just the reality and being up against workplace politics and making the case to convince orgs to hire a bigger team of devops/dbadmin guys involves a lot of shitty meetings and political prowess that is typically beyond an engineers set of capabilities, at least below the senior level. any engineer can figure out how to deploy postgres to something, but are they doing it in a way that makes an orgs security/infra guys happy? probably not. are they prepared to handle weird storage scenarios (log or temp space filling grinding server to a halt) and understand the weird and robust ways to manage a deployment? probably not.
Oh wow, the "Postgres for Developers, Devices, and Agents" company wants us to use Postgres?
Postgres has its merits without the company who's also enjoying using it.
probably not many Firebase users here but I love Firebase's Firestore
It's very bad to search stuff, or have structured database, also it costs a lot
its dirt cheap and you can do basic search
It irks me that these "just use Postgres" posts only talk about feature sets with no discussion about operations, reliability, real scaling, or even just guard rails and opinions to deter you from making bad design decisions. The author writes about how three nine's is multiplied over several dependencies, but that's not how this shakes out in practice. Your relational database is typically far more vulnerable than distributed alternatives. "Just use Postgres" is fine advice but gets used as a crutch by companies who wind up building everything in-house for no good reason.
https://github.com/Olshansk/postgres_for_everything/
I have a colleague who (inexplicably) doesn't trust Postgres for "high performance" applications. He needed a database of shared state for a variable number of running containers to manage a queue, so he decided to implement his own bespoke file-based database, using shared disk. Lo and behold, during the first big (well-anticipated) high-demand event, that system absolutely crawled. It ran, but it was a total bottleneck during two days of high demand. I, who has made a New Years resolution to no longer spend political capital on things that I can't change, looked on with a keen degree of schadenfreude.
Just. Use. Postgres.
Pinecone allows hybrid search, merging dense and sparse vector embeddings that Postgres can't do AFAIK. That results in ~10% worse retrieval scores which might be the difference between making it in the business or not.
The lesson for me isn't "don't use Pinecone", but more like "did you already max out Postgres?"
In many cases, it is going to save you time by having less infra and one less risk while you're getting started. And if you find yourself outgrowing the capabilities of Pg then you look for an alternative.
The article shows an example of hybrid search using RRF.
With BM25 which has a far worse/non-generalizable performance than sparse embeddings Pinecone supports. Moreover you get a latency hit from RRF that makes it challenging to use for e.g. real-time multimodal chat agents.
Can anyone comment on whether postgres can replace full columnar DB? I see "full text search" but it feels like this is falling a little short of the full power of elastic -- but would be happy to be wrong (one less tech to remember).
There are several different plugins for postgres that do columnar tables, including the company TigerData which wrote this blog
According to the LLM google search result, yes.
Have you looked into it?
How does Postgres stack up against columnar databases like Vertica and DuckDB for analytical queries?
Not great. It works but the performance isn't ideal for this case. Still the advice is sound. You can build a new analytics system when you're doing enough analytical queries on enough data to bog down the primary system.
See my Comment on why hybrid db's like tigerDB(data) are good
- https://news.ycombinator.com/item?id=46876037
I think it's disgenious by the author to publish this article heavily edited by AI and not disclose it.
I don't disagree, but I think big enterprises expect support, roadmaps, and the ability to ask for deliverables depending on the sale or context of the service.
The only thing postgres lacks is a distributed option.
Postgres is king of its own, other solutions can be incorporated in it eventually by someone or some organization, that's it
Meh.
I agree that managing lots of databases can be a pain in the ass, but trying to make Postgres do everything seems like a problem as well. A lot of these things are different things and trying to make Postgres do all of them seems like it will lead to similar if not worse outcomes than having separate dedicated services.
I understand that people were too overeager to jump on the MongoDB web scale nosql crap, but at this point I think there might have been an overcorrection. The problem with the nosql hype wasn't that they weren't using SQL, it's that they were shoehorning it everywhere, even in places where it wasn't a good fit for the job. Now this blog post is telling us to shoehorn Postgres everywhere, even if it isn't a good fit for the job...
to be fair, Postgres cana basically do everything Mongo can and just as well.
Ok, I'm a little skeptical of that claim but let's grant it. I still don't think Postgres is going to do everything Kafka and Redis can do as well as Kafka or Redis.
pgmq gets very close for a lot of Kafka use cases
I really wonder how "It's year X" could establish itself as an argument this popular.
fair points made but I use sqlite for many things because sometimes you just need a tent
The point of Redis is data structures and algorithmic complexity of operations. If you use Redis well, you can't replace it with PostgreSQL. But I bet you can't replace memcached either for serious use cases.
As someone who is a huge fan of both Redis and Postgres, I whole heartedly agree with the "if you are using Redis well, you can't replace it with PostgreSQL" statement.
What I like about the "just use PostgreSQL" idea is that, unfortunately, most people don't use Redis well. They are just using it as a cache, which IMHO, isn't even equivalent to scratching the surface of all the amazing things Redis can do.
As we all know, it's all about tradeoffs. If you are only using Redis as a cache, then does the performance improvement you get by using it out weight the complexity of another system dependency? Maybe? Depends...
Side note: If you are using Redis for caching and queue management, those are two separate considerations. Your cache and queues should never live on the same Redis instance because the should have different max-memory policies! </Side note>
The newest versions of Rails have really got me thinking about the simplicity of a PostgreSQL only deployment, then migrating to other data stores as needed down the line. I'd put the need to migrate squarely into the "good problems" to have because it indicates that your service is growing and expanding past the first few stages of growth.
All that being said, man I think Redis is sooooo cool. It's the hammer I am always for a nail to use on.
“well” is doing a lot of heavy lifting in your comment. Across a number of companies using Redis, I’ve never seen it used correctly. Adding it to the tech stack is always justified with hand waving about scalability.
well, redis is a bit of a junk bin of random barely related tools. It's just very likely that any project of non-trivial complexity will need at least some of them and I wouldn't necessarily advocate for trying jerry-rigging most of them in postgresql like the author of article, for example why would anyone want wasting their SQL DB server performance on KV lookups?
There are data structures in Redis?
They may be its point, but I frankly didn't see much use in the wild. You might argue that then those systems didn't need Redis in the first place and I'd agree, but then note that that is the point tigerdata makes.
edit: it's not about serious uses, it's about typical uses, which are sad (and same with Kafka, Elastic, etc, etc)
Did someone really downvote the creator of Redis?
All the time here in HN, I'm proud of it -- happy to have opinions not necessarily aligned with what users want to listen to. Also: never trust the establishment without thinking! ;D
IIRC there was a pre-edit version with snark.
Yes, but the downvotes came later too, I edited it with the same exact content but without the asshole that is in me. Still downvotes received.
Delaying votes may be one of HN's anti-manipulation tactics.
I was one of the downvoters, and at the time I downvoted it, it was a very different comment. this is the original (copied from another tab that I hadn't refreshed yet):
> Tell me you don't understand Redis point is data structures without telling me you don't understand Redis point is data structures.
regardless of the author, I think slop of that sort belongs on reddit, not HN.
and if you think it doesn't fit your suitcase? just add an extension and you're good to go (ex: timescaledb)
Something TFA doesn’t mention, but which I think is actually the most important distinction of all to be making here:
If you follow this advice naively, you might try to implement two or more of these other-kind-of-DB simulacra data models within the same Postgres instance.
And it’ll work, at first. Might even stay working if only one of the workloads ends up growing to a nontrivial size.
But at scale, these different-model workloads will likely contend with one-another, starving one-another of memory or disk-cache pages; or you’ll see an “always some little thing happening” workload causing a sibling “big once-in-a-while” workload to never be able to acquire table/index locks to do its job (or vice versa — the big workloads stalling the hot workloads); etc.
And even worse, you’ll be stuck when it comes to fixing this with instance-level tuning. You can only truly tune a given Postgres instance to behave well for one type-of-[scaled-]workload at a time. One workload-type might use fewer DB connections and depend for efficiency on them having a higher `work_mem` and `max_parallel_workers` each; while another workload-type might use many thousands of short-lived connections and depend on them having small `work_mem` so they’ll all fit.
But! The conclusion you should draw from being in this situation shouldn’t be “oh, so Postgres can’t handle these types of workloads.”
No; Postgres can handle each of these workloads just fine. It’s rather that your single monolithic do-everything Postgres instance, maybe won’t be able to handle this heterogeneous mix of workloads with very different resource and tuning requirements.
But that just means that you need more Postgres.
I.e., rather than adding a different type-of-component to your stack, you can just add another Postgres instance, tuned specifically to do that type of work.
Why do that, rather than adding a component explicitly for caching/key-values/documents/search/graphs/vectors/whatever?
Well, for all the reasons TFA outlines. This “Postgres tuned for X” instance will still be Postgres, and so you’ll still get all the advantages of being able to rely on a single query language, a single set of client libraries and tooling, a single coherent backup strategy, etc.
Where TFA’s “just use Postgres” in the sense of reusing your Postgres instance only scales if your DB is doing a bare minimum of that type of work, interpreting “just use Postgres” in the sense of adding a purpose-defined Postgres instance to your stack will scale nigh-on indefinitely. (To the point that, if you ever do end up needing what a purpose-built-for-that-workload datastore can give you, you’ll likely be swapping it out for an entire purpose-defined PG cluster by that point. And the effort will mostly serve the purpose of OpEx savings, rather than getting you anything cool.)
And, as a (really big) bonus of this approach, you only need to split PG this way where it matters, i.e. in production, at scale, at the point that the new workload-type is starting to cause problems/conflicts. Which means that, if you make your codebase(s) blind to where exactly these workloads live (e.g. by making them into separate DB connection pools configured by separate env-vars), then:
- in dev (and in CI, staging, etc), everything can default to happening on the one local PG instance. Which means bootstrapping a dev-env is just `brew install postgres`.
- and in prod, you don’t need to pre-build with new components just to serve your new need. No new Redis instance VM just to serve your so-far-tiny KV-storage needs. You start with your new workload-type sharing your “miscellaneous business layer” PG instance; and then, if and when it becomes a problem, you migrate it out.
No thanks. In 2026 I want HA and replication out of the box without the insanity.
Came to say the same thing. Personally I'd only touch Postgres in a couple cases.
1. Downtime doesn't matter. 2. Paying someone else (eg. AWS) to manage redundancy and fail-over.
It just feels crazy to me that Postgres still doesn't have a native HA story since I last battled with this well over a decade ago.
Exactly my thoughts immediately after reading the word “just”. Also, PITR.
You exceeded the step of maxing out the best server you can buy?
HA is not about exceeding the limits of a server. Its about still serving traffic when that best server I bought goes offline (or has failed memory chip, or a disk or... ).
Replication?
Postgres replication, even in synchronous mode, does not maintain its consistency guarantees during network partitions. It's not a CP system - I don't think it would actually pass a Jepsen test suite in a multi-node setup[1]. No amount of tooling can fix this without a consensus mechanism for transactions.
Same with MySQL and many other "traditional" databases. It tends to work out because these failures are rare and you can get pretty close with external leader election and fencing, but Postgres is NOT easy (likely impossible) to operate as a CP system according to the CAP theorem.
There are various attempts at fixing this (Yugabyte, Neon, Cockroach, TiDB, ...) which all come with various downsides.
[1]: Someone tried it with Patroni and failed miserably, https://www.binwang.me/2024-12-02-PostgreSQL-High-Availabili...
It's 2026, just use Planetscale Postgres
Unless you're doing OLTP. Then, TigerBeetle ;)
Meh, it's 2026! Unless you're Google, you should probably just pipe all your data to /dev/null (way faster than postgres!) and then have a LLM infer the results of any get requests
Do tell about all your greenfield yet large scale persistence needs where this discussion even applies
Can I just say, I'm getting really sick of these LLM-generated posts clogging up this site?
GPTZero gives this a 95% chance of being entirely AI-generated. (5% human-AI mix, and 0% completely original.)
But I could tell you that just by using my eyes, the tells are so obvious. "The myth / The reality, etc."
If I wanted to know what ChatGPT had to say about something, I would ask ChatGPT. That's not what I come here for, and I think the same applies to most others.
Here's an idea for all you entrepreneur types: devise a privacy-preserving, local-running browser extension for scanning all content that a user encounters in their browser - and changing the browser extension icon to warn of an AI generated article or social media post or whatever. So that I do not have to waste a further second interacting with it. I would genuinely pay a hefty subscription fee for such a service at this point, provided it worked well.
"juSt use PoStGreS" is spoken like a true C-level with no experience on the ground with postgres itself or its spin offs.
yes pg is awesome and it’s my go to for relational databases. But the reason why mongo or influx db exists is because they excel in those areas.
I would use pg for timeseries for small use cases, testing. But scaling pg for production time series workloads is not worth it. You end up fighting the technology to get it to work just because some lame person wanted to simplify ops
postgres is great, but its not great at sharding tables.
I have problem pulled out postgres 10 or more times for various projects at work. Each time I had to fight for it, each time I won, it did absolute everything I needed it to do and did it well.
This is just AI slop. The best tell is how much AI loves tables. Look at "The Hidden Costs Add Up", where it literally just repeats "1" in the second column and "7" in the third column. No human would ever write a table like that.
Agree. Very unprofessional and disrespectful to prospective customers to not even have a human edit your blog copy
I mean, it's a pain at times to keep elastic in sync with the main db, but saying elastic is just an algorithm for text search feels odd.
but mongo is webscale.
This reads like AI generated slop, though the point about simplicity is valid.
Timely!
I like PostgreSQL. If I am storing relational data I use it.
But for non relational data, I prefer something simpler depending on what the requirements are.
Commenters here are talking "modern tools" and complex systems. But I am thinking of common simpler cases where I have seen so many people reach for a relational database from habit.
For large data sets there are plenty of key/value stores to choose from, for small (less than a mega byte) data then a CSV file will often work best. Scanning is quicker than indexing for surprisingly large data sets.
And so much simpler
No more ORMs, not even query builders, ... When it comes to Postgres I want to write the sql myself! There is so much value.
Supabase helps when building a webapp. But Postgres is the powerhouse.
Only if DuckDB is an acceptable value of PostgreSQL. I agree that PostgreSQL has eaten many DB use-cases, but the breathless hype is becoming exhausting.
Look. In a PostgreSQL extension, I can't:
1. extend the SQL language with ergonomic syntax for my use-case,
2. teach the query planner to understand execution strategies that can't be made to look PostgreSQL's tuple-and-index execution model, or
3. extend the type system to plumb new kinds of metadata through the whole query and storage system via some extensible IR.
(Plus, embedded PostgreSQL still isn't a first class thing.)
Until PostgreSQL's extension mechanism is powerful enough for me to literally implement DuckDB as an extension, PostgreSQL is not a panacea. It's a good system, but nowhere near universal.
Now, once I can do DuckDB (including its language extensions) in PostgreSQL, and once I can use the thing as a library, let's talk.
(pg_duckdb doesn't count. It's a switch, not a unified engine.)
I find myself needing to start something quickly with a bit of login and data/user management.
Postgres won as the starting point again thanks to Supabase.
This is the future of all devtools in the AI era. There's no reason for tool innovation because we'll just use whatever AIs know best which will always be the most common thing in their training data. It's a self-reinforcing loop. The most common languages, tools, libraries of today are what we will be stuck with for the foreseeable future.
Is that any different from "we'll just use whatever devs know best, which will always be the most common thing?"
eg Python, react... very little OCaml, Haskell, etc.
I feel like this is selling redis short on its features.
Im also curious about benchmark results.
Lots of familiar things here except for this UNLOGGED table as a cache thing. That's totally new to me. Has someone benched this approach against memcached and redis ? I'm extremely skeptical PGs query / protocol overheads are going to be competitive with memcached, but I'm making this up and have nothing to back it up.
They don't compare exactly. Author is mistaken in thinking that UNLOGGED means "in memory". It means "no WAL", so there's considerable speed up there, but traded in with also more volatility. To be a viable alternative to Redis or Memcached though, the savings you get from the latter two must really be superfluous to your use case. Which could be true for many (most?).
Its not only about performance, Redis data structures offer an even more advanced caching and data processing. I even use Redis as a cache for ClickHouse.
Nice! How do you "preinstall the extensions" so that you can have eg timescaledb and others available to install in your Postgres? Do you need to install some binaries first?
Get AWS to actually support pgvectorscale and timescaledb for RDS or Aurora and then maybe... sigh....
KISS.
Good stuff, I turned my gist into an info site and searchable directory (and referenced this article as well, which seems to pay homage to my gist, which in turn inspired the site)
https://PostgresIsEnough.dev
This is a good summary, though I'd love to see a "High Availability and Automated Failover" entry in that table.
It's 20xx just use sqlite. Almost no-one needs all that power; they sure do think they do, but really don't. And will never. SQLite + Duck is all you need even with a million visitors; when you need failover and scaling you need more, but that is a tiny fraction of all companies.
Having built production apps on SQLite I can say it's not all sunshine and roses, the complexity explodes the moment you need multiple workers that can all write.
You better hope you dont have any big indexes or your writes will queue and you start getting "database is locked" errors. Good luck queueing writes properly at the application layer without rebuilding a full distributed queue system / network interface / query retry system on top of SQLite.
I really really really wish SQLite or PGLite supported multiple writers, especially if the queries are not even touching the same tables.
Oh I didn't say that, but nor is postgres. Use the right tool for the job I agree with, just people are making their lives difficult (expensive as well) by just blindly picking postgres.
Huh?
SQLite is designed for application databases.
Postgres is designed for client-server.
It's not about "power" (what does that even mean?), it's about totally different design choices that are intended for different purposes. It's an architecture question, not a "power" or "visitors" question.
We run plenty of money making SaaS on sqlite without issues. And have been for over a decade. By power I meant all the complex features postgres has. But yes, it's an architecture question; my point being, most people pick many bulldozers while they need a 1 shovel.
I'm glad we agree that "just use postgres" is horrible advice.
Would you except "Just use postgres for client-server" as a reasonable change?