A lot of people here are commenting that if you have to care about specific latency numbers in Python you should just use another language.
I disagree. A lot of important and large codebases were grown and maintained in Python (Instagram, Dropbox, OpenAI) and it's damn useful to know how to reason your way out of a Python performance problem when you inevitably hit one without dropping out into another language, which is going to be far more complex.
Python is a very useful tool, and knowing these numbers just makes you better at using the tool.
The author is a Python Software Foundation Fellow. They're great at using the tool.
In the common case, a performance problem in Python is not the result of hitting the limit of the language but the result of sloppy un-performant code, for example unnecessarily calling a function O(10_000) times in a hot loop.
It’s very natural. Python is fantastic for going from 0 to 1 because it’s easy and forgiving. So lots of projects start with it. Especially anything ML focused. And it’s much harder to change tools once a project is underway.
I think both points are fair. Python is slow - you should avoid it if speed is critical, but sometimes you can’t easily avoid it.
I think the list itself is super long winded and not very informative. A lot of operations take about the same amount of time. Does it matter that adding two ints is very slightly slower than adding two floats? (If you even believe this is true, which I don’t.) No. A better summary would say “all of these things take about the same amount of time: simple math, function calls, etc. these things are much slower: IO.” And in that form the summary is pretty obvious.
Or keep your Python scaffolding, but push the performance-critical bits down into a C or Rust extension, like numpy, pandas, PyTorch and the rest all do.
But I agree with the spirit of what you wrote - these numbers are interesting but aren’t worth memorizing. Instead, instrument your code in production to see where it’s slow in the real world with real user data (premature optimization is the root of all evil etc), profile your code (with pyspy, it’s the best tool for this if you’re looking for cpu-hogging code), and if you find yourself worrying about how long it takes to add something to a list in Python you really shouldn’t be doing that operation in Python at all.
I agree. I've been living off Python for 20 years and have never needed to know any of these numbers, nor do I need them now, for my work, contrary to the title. I also regularly use profiling for performance optimization and opt for Cython, SWIG, JIT libraries, or other tools as needed. None of these numbers would ever factor into my decision-making.
Why? I've build some massive analytic data flows in Python with turbodbc + pandas which are basically C++ fast. It uses more memory which supports your point, but on the flip-side we're talking $5-10 extra cost a year. It could frankly be $20k a year and still be cheaper than staffing more people like me to maintain these things, rather than having a couple of us and then letting the BI people use the tools we provide for them. Similarily when we do embeded work, micro-python is just so much easier to deal with for our engineering staff.
The interoperability between C and Python makes it great, and you need to know these numbers on Python to know when to actually build something in C. With Zig getting really great interoperability, things are looking better than ever.
Not that you're wrong as such. I wouldn't use Python to run an airplane, but I really don't see why you wouldn't care about the resources just because you're working with an interpreted or GC language.
> you need to know these numbers on Python to know when to actually build something in C
People usually approach this the other way, use something like pandas or numpy from the beginning if it solves your problem. Do not write matrix multiplications or joins in python at all.
If there is no library that solves your problem, it's a great indication that you should avoid python. Unless you are willing to spend 5 man-years writing a C or C++ library with good python interop.
I doubt there is much to gain from knowing how much memory an empty string takes. The article or the listed numbers have a weird fixation on memory usage numbers and concrete time measurements. What is way more important to "every programmer" is time and space complexity, in order to avoid designing unnecessarily slow or memory hungry programs. Under the assumption of using Python, what is the use of knowing that your int takes 28 bytes? In the end you will have to determine, whether the program you wrote meats the performance criteria you have and if it does not, then you need a smarter algorithm or way of dealing with data. It helps very little to know that your 2d-array of 1000x1000 bools is so and so big. What helps is knowing, whether it is too much and maybe you should switch to using a large integer and a bitboard approach. Or switch language.
> Under the assumption of using Python, what is the use of knowing that your int takes 28 bytes?
Relevant if your problem demands instatiation of a large number of objects. This reminds me of a post where Eric Raymond discusses the problems he faced while trying to use Reposurgeon to migrate GCC. See http://esr.ibiblio.org/?p=8161
A meta-note on the title since it looks like it’s confusing a lot of commenters: The title is a play on Jeff Dean’s famous “Latency Numbers Every Programmer Should Know” from 2012. It isn’t meant to be interpreted literally. There’s a common theme in CS papers and writing to write titles that play upon themes from past papers. Another common example is the “_____ considered harmful” titles.
The title was meant to be taken literally, as in you're supposed to memorize all of these numbers. It was meant as an in-joke reference to the original writing to signal that this document was going to contain timing values for different operations.
I completely understand why it's frustrating or confusing by itself, though.
Good callout on the paper reference, but this author gives gives every indication that he’s dead serious in the first paragraph. I don’t think commenters are confused.
Every Python programmer should be thinking about far more important things than low level performance minutiae. Great reference but practically irrelevant except in rare cases where optimization is warranted. If your workload grows to the point where this stuff actually matters, great! Until then it’s a distraction.
I agree - however, that has mostly been a feeling for me for years. Things feel fast enough and fine.
This page is a nice reminder of the fact, with numbers. For a while, at least, I will Know, instead of just feel, like I can ignore the low level performance minutiae.
Having general knowledge about the tools you're working with is not a distraction, it's an intellectual enrichment in any case, and can be a valuable asset in specific cases.
Sometimes it’s as simple as finding the hotspot with a profiler and making a simple change to an algorithm or data structure, just like you would do in any language. The amount of handwringing people do about building systems with Python is silly.
That's a long list of numbers that seem oddly specific. Apart from learning that f-strings are way faster than the alternatives, and certain other comparisons, I'm not sure what I would use this for day-to-day.
After skimming over all of them, it seems like most "simple" operations take on the order of 20ns. I will leave with that rule of thumb in mind.
That number isn't very useful either, it really depends on the hardware. Most virtualized server CPUs where e.g. Django will run on in the end are nowhere near the author's M4 Pro.
Last time I benchmarked a VPS it was about the performance of an Ivy Bridge generation laptop.
> Last time I benchmarked a VPS it was about the performance of an Ivy Bridge generation laptop.
I have a number of Intel N95 systems around the house for various things. I've found them to be a pretty accurate analog for small instances VPSes. The N95 are Intel E-cores which are effectively Sandy Bridge/Ivy Bridge cores.
Stuff can fly on my MacBook but than drag on a small VPS instance but validating against an N95 (I already have) is helpful. YMMV.
Collection Access and Iteration
How fast can you get data out of Python’s built-in collections? Here is a dramatic example of how much faster the correct data structure is. item in set or item in dict is 200x faster than item in list for just 1,000 items!
It seems to suggest an iteration for x in mylist is 200x slower than for x in myset. It’s the membership test that is much slower. Not the iteration. (Also for x in mydict is an iteration over keys not values, and so isn’t what we think of as an iteration on a dict’s ‘data’).
Also the overall title “Python Numbers Every Programmer Should Know” starts with 20 numbers that are merely interesting.
That all said, the formatting is nice and engaging.
Thanks for the feedback everyone. I appreciate your posting it @woodenchair and @aurornis for pointing out the intent of the article.
The idea of the article is NOT to suggest you should shave 0.5ns off by choosing some dramatically different algorithm or that you really need to optimize the heck out of everything.
In fact, I think a lot of what the numbers show is that over thinking the optimizations often isn't worth it (e.g. caching len(coll) into a variable rather than calling it over and over is less useful that it might seem conceptually).
Just write clean Python code. So much of it is way faster than you might have thought.
My goal was only to create a reference to what various operations cost to have a mental model.
This is really weird thing to worry about in python. But is also misleading; Python int is arbitrary precision, they can take up much more storage and arithmetic time depending in their value.
I think we can safely steelman the claim to "every Python programmer should know", and even from there, every "serious" Python programmer, writing Python professionally for some "important" reason, not just everyone who picks up Python for some scripting task. Obviously there's not much reason for a C# programmer to go try to memorize all these numbers.
Though IMHO it suffices just to know that "Python is 40-50x slower than C and is bad at using multiple CPUs" is not just some sort of anti-Python propaganda from haters, but a fairly reasonable engineering estimate. If you know that you don't really need that chart. If your task can tolerate that sort of performance, you're fine; if not, figure out early how you are going to solve that problem, be it through the several ways of binding faster code to Python, using PyPy, or by not using Python in the first place, whatever is appropriate for your use case.
I doubt list and string concatenation operate in constant time, or else they affect another benchmark. E.g., you can concatenate two lists in the same time, regardless of their size, but at the cost of slower access to the second one (or both).
More contentiously: don't fret too much over performance in Python. It's a slow language (except for some external libraries, but that's not the point of the OP).
String concatenation is mentioned twice on that page, with the same time given. The first time it has a parenthetical "(small)", the second time doesn't have it. I expect you were looking at the second one when you typed that as I would agree that you can't just label it as a constant time, but they do seem to have meant concatenating "small" strings, where the overhead of Python's object construction would dominate the cost of the construction of the combined string.
That appears to be the size of the list itself, not including the objects it contains: 8 bytes per entry for the object pointer, and a kilo-to-kibi conversion. All Python values are "boxed", which is probably a more important thing for a Python programmer to know than most of these numbers.
The list of floats is larger, despite also being simply an array of 1000 8-byte pointers. I assume that it's because the int array is constructed from a range(), which has a __len__(), and therefore the list is allocated to exactly the required size; but the float array is constructed from a generator expression and is presumably dynamically grown as the generator runs and has a bit of free space at the end.
It's important to know that these numbers will vary based on what you're measuring, your hardware architecture, and how your particular Python binary was built.
For example, my M4 Max running Python 3.14.2 from Homebrew (built, not poured) takes 19.73MB of RAM to launch the REPL (running `python3` at a prompt).
The same Python version launched on the same system with a single invocation for `time.sleep()`[1] takes 11.70MB.
My Intel Mac running Python 3.14.2 from Homebrew (poured) takes 37.22MB of RAM to launch the REPL and 9.48MB for `time.sleep`.
My number for "how much memory it's using" comes from running `ps auxw | grep python`, taking the value of the resident set size (RSS column), and dividing by 1,024.
1: python3 -c 'from time import sleep; sleep(100)'
That's an "all or nothing" fallacy. Just because you use Python and are OK with some slowdown, doesn't mean you're OK with each and every slowdown when you can do better.
To use a trivial example, using a set instead of a list to check membership is a very basic replacement, and can dramatically improve your running time in Python. Just because you use Python doesn't mean anything goes regarding performance.
I have some questions and requests for clarification/suspicious behavior I noticed after reviewing the results and the benchmark code, specifically:
- If slotted attribute reads and regular attribute reads are the same latency, I suspect that either the regular class may not have enough "bells on" (inheritance/metaprogramming/dunder overriding/etc) to defeat simple optimizations that cache away attribute access, thus making it equivalent in speed to slotted classes. I know that over time slotting will become less of a performance boost, but--and this is just my intuition and I may well be wrong--I don't get the impression that we're there yet.
- Similarly "read from @property" seems suspiciously fast to me. Even with descriptor-protocol awareness in the class lookup cache, the overhead of calling a method seems surprisingly similar to the overhead of accessing a field. That might be explained away by the fact that property descriptors' "get" methods are guaranteed to be the simplest and easiest to optimize of all call forms (bound method, guaranteed to never be any parameters), and so the overhead of setting up the stack/frame/args may be substantially minimized...but that would only be true if the property's method body was "return 1" or something very fast. The properties tested for these benchmarks, though, are looking up other fields on the class, so I'd expect them to be a lot slower than field access, not just a little slower (https://github.com/mikeckennedy/python-numbers-everyone-shou...).
- On the topic of "access fields of objects" (properties/dataclasses/slots/MRO/etc.), benchmarks are really hard to interpret--not just these benchmarks, all of them I've seen. That's because there are fundamentally two operations involved: resolving a field to something that produces data for it, and then accessing the data. For example, a @property is in a class's method cache, so resolving "instance.propname" is done at the speed of the methcache. That might be faster than accessing "instance.attribute" (a field, not a @property or other descriptor), depending on the inheritance geometry in play, slots, __getattr[ibute]__ overrides, and so on. On the other hand, accessing the data at "instance.propname" is going to be a lot more expensive for most @properties (because they need to call a function, use an argument stack, and usually perform other attribute lookups/call other functions/manipulate locals, etc); accessing data at "instance.attribute" is going to be fast and constant-time--one or two pointer-chases away at most.
- Nitty: why's pickling under file I/O? Those benchmarks aren't timing pickle functions that perform IO, they're benchmarking the ser/de functionality and thus should be grouped with json/pydantic/friends above.
- Asyncio's no spring chicken, but I think a lot of the benchmarks listed tell a worse story than necessary, because they don't distinguish between coroutines, Tasks, and Futures. Coroutines are cheap to have and call, but Tasks and Futures have a little more overhead when they're used (even fast CFutures) and a lot more overhead to construct since they need a lot more data resources than just a generator function (which is kinda what a raw coroutine desugars to, but that's not as true as most people think it is...another story for another time). Now, "run_until_complete{}" and "gather()" initially take their arguments and coerce them into Tasks/Futures--that detection, coercion, and construction takes time and consumes a lot of overhead. That's good to know (since many people are paying that coercion tax unknowingly), but it muddies the boundary between "overhead of waiting for an asyncio operation to complete" and "overhead of starting an asyncio operation". Either calling the lower-level functions that run_until_complete()/gather() use internally, or else separating out benchmarks into ones that pass Futures/Tasks/regular coroutines might be appropriate.
- Benchmarking "asyncio.sleep(0)" as a means of determining the bare-minimum await time of a Python event loop is a bad idea. sleep(0) is very special (more details here: https://news.ycombinator.com/item?id=46056895) and not representative. To benchmark "time it takes for the event loop to spin once and produce a result"/the python equivalent of process.nextTick, it'd be better to use low-level loop methods like "call_soon" or defer completion to a Task and await that.
Initially I thought how efficient strings are... but then I understood how inefficient arithmetic is.
Interesting comparison but exact speed and IO depend on a lot of things, and unlikely one uses Mac mini in production so these numbers definitely aren't representative.
Great reference overall, but some of these will diverge in practice: 141 bytes for a 100 char string won’t hold for non-ASCII strings for example, and will change if/when the object header overhead changes.
int is larger than float, but list of floats is larger than list of ints
Then again, if you're worried about any of the numbers in this article maybe you shouldn't be using Python at all. I joke, but please do at least use Numba or Numpy so you aren't paying huge overheads for making an object of every little datum.
tfa mentions running benchmark on a multi-core platform, but doesn't mention if benchmark results used multithreading.. a brief look at the code suggests not
A lot of people here are commenting that if you have to care about specific latency numbers in Python you should just use another language.
I disagree. A lot of important and large codebases were grown and maintained in Python (Instagram, Dropbox, OpenAI) and it's damn useful to know how to reason your way out of a Python performance problem when you inevitably hit one without dropping out into another language, which is going to be far more complex.
Python is a very useful tool, and knowing these numbers just makes you better at using the tool. The author is a Python Software Foundation Fellow. They're great at using the tool.
In the common case, a performance problem in Python is not the result of hitting the limit of the language but the result of sloppy un-performant code, for example unnecessarily calling a function O(10_000) times in a hot loop.
I wrote up a more focused "Python latency numbers you should know" as a quiz here https://thundergolfer.com/computers-are-fast
> A lot of important and large codebases were grown and maintained in Python
How does this happen? Is it just inertia that cause people to write large systems in a essentially type free, interpreted scripting language?
It’s very natural. Python is fantastic for going from 0 to 1 because it’s easy and forgiving. So lots of projects start with it. Especially anything ML focused. And it’s much harder to change tools once a project is underway.
I think both points are fair. Python is slow - you should avoid it if speed is critical, but sometimes you can’t easily avoid it.
I think the list itself is super long winded and not very informative. A lot of operations take about the same amount of time. Does it matter that adding two ints is very slightly slower than adding two floats? (If you even believe this is true, which I don’t.) No. A better summary would say “all of these things take about the same amount of time: simple math, function calls, etc. these things are much slower: IO.” And in that form the summary is pretty obvious.
Counterintuitively: program in python only if you can get away without knowing these numbers.
When this starts to matter, python stops being the right tool for the job.
Or keep your Python scaffolding, but push the performance-critical bits down into a C or Rust extension, like numpy, pandas, PyTorch and the rest all do.
But I agree with the spirit of what you wrote - these numbers are interesting but aren’t worth memorizing. Instead, instrument your code in production to see where it’s slow in the real world with real user data (premature optimization is the root of all evil etc), profile your code (with pyspy, it’s the best tool for this if you’re looking for cpu-hogging code), and if you find yourself worrying about how long it takes to add something to a list in Python you really shouldn’t be doing that operation in Python at all.
"if you're not measuring, you're not optimizing"
I agree. I've been living off Python for 20 years and have never needed to know any of these numbers, nor do I need them now, for my work, contrary to the title. I also regularly use profiling for performance optimization and opt for Cython, SWIG, JIT libraries, or other tools as needed. None of these numbers would ever factor into my decision-making.
Why? I've build some massive analytic data flows in Python with turbodbc + pandas which are basically C++ fast. It uses more memory which supports your point, but on the flip-side we're talking $5-10 extra cost a year. It could frankly be $20k a year and still be cheaper than staffing more people like me to maintain these things, rather than having a couple of us and then letting the BI people use the tools we provide for them. Similarily when we do embeded work, micro-python is just so much easier to deal with for our engineering staff.
The interoperability between C and Python makes it great, and you need to know these numbers on Python to know when to actually build something in C. With Zig getting really great interoperability, things are looking better than ever.
Not that you're wrong as such. I wouldn't use Python to run an airplane, but I really don't see why you wouldn't care about the resources just because you're working with an interpreted or GC language.
> you need to know these numbers on Python to know when to actually build something in C
People usually approach this the other way, use something like pandas or numpy from the beginning if it solves your problem. Do not write matrix multiplications or joins in python at all.
If there is no library that solves your problem, it's a great indication that you should avoid python. Unless you are willing to spend 5 man-years writing a C or C++ library with good python interop.
Exactly. If you're working on an application where these numbers matter, Python is far too high-level a language to actually be able to optimize them.
I doubt there is much to gain from knowing how much memory an empty string takes. The article or the listed numbers have a weird fixation on memory usage numbers and concrete time measurements. What is way more important to "every programmer" is time and space complexity, in order to avoid designing unnecessarily slow or memory hungry programs. Under the assumption of using Python, what is the use of knowing that your int takes 28 bytes? In the end you will have to determine, whether the program you wrote meats the performance criteria you have and if it does not, then you need a smarter algorithm or way of dealing with data. It helps very little to know that your 2d-array of 1000x1000 bools is so and so big. What helps is knowing, whether it is too much and maybe you should switch to using a large integer and a bitboard approach. Or switch language.
> Under the assumption of using Python, what is the use of knowing that your int takes 28 bytes?
Relevant if your problem demands instatiation of a large number of objects. This reminds me of a post where Eric Raymond discusses the problems he faced while trying to use Reposurgeon to migrate GCC. See http://esr.ibiblio.org/?p=8161
A meta-note on the title since it looks like it’s confusing a lot of commenters: The title is a play on Jeff Dean’s famous “Latency Numbers Every Programmer Should Know” from 2012. It isn’t meant to be interpreted literally. There’s a common theme in CS papers and writing to write titles that play upon themes from past papers. Another common example is the “_____ considered harmful” titles.
Going to write a real banger of a paper called "latency numbers considered harmful is all you need" and watch my academic cred go through the roof.
This title only works if the numbers are actually useful. Those are not, and there are far too many numbers for this to make sense.
The title was meant to be taken literally, as in you're supposed to memorize all of these numbers. It was meant as an in-joke reference to the original writing to signal that this document was going to contain timing values for different operations.
I completely understand why it's frustrating or confusing by itself, though.
Good callout on the paper reference, but this author gives gives every indication that he’s dead serious in the first paragraph. I don’t think commenters are confused.
Every Python programmer should be thinking about far more important things than low level performance minutiae. Great reference but practically irrelevant except in rare cases where optimization is warranted. If your workload grows to the point where this stuff actually matters, great! Until then it’s a distraction.
I agree - however, that has mostly been a feeling for me for years. Things feel fast enough and fine.
This page is a nice reminder of the fact, with numbers. For a while, at least, I will Know, instead of just feel, like I can ignore the low level performance minutiae.
Having general knowledge about the tools you're working with is not a distraction, it's an intellectual enrichment in any case, and can be a valuable asset in specific cases.
Knowing that an empty string is 41 bytes or how many ns it takes to do arithmetic operations is not general knowledge.
Yeah, if you hit limits just look for a module that implements the thing in C (or write it). This is how it was always done in Python.
Sometimes it’s as simple as finding the hotspot with a profiler and making a simple change to an algorithm or data structure, just like you would do in any language. The amount of handwringing people do about building systems with Python is silly.
https://rushter.com/blog/python-strings-and-memory/
That's a long list of numbers that seem oddly specific. Apart from learning that f-strings are way faster than the alternatives, and certain other comparisons, I'm not sure what I would use this for day-to-day.
After skimming over all of them, it seems like most "simple" operations take on the order of 20ns. I will leave with that rule of thumb in mind.
That number isn't very useful either, it really depends on the hardware. Most virtualized server CPUs where e.g. Django will run on in the end are nowhere near the author's M4 Pro.
Last time I benchmarked a VPS it was about the performance of an Ivy Bridge generation laptop.
> Last time I benchmarked a VPS it was about the performance of an Ivy Bridge generation laptop.
I have a number of Intel N95 systems around the house for various things. I've found them to be a pretty accurate analog for small instances VPSes. The N95 are Intel E-cores which are effectively Sandy Bridge/Ivy Bridge cores.
Stuff can fly on my MacBook but than drag on a small VPS instance but validating against an N95 (I already have) is helpful. YMMV.
+1 but I didn't see pack / unpack...
The titles are oddly worded. For example -
It seems to suggest an iteration for x in mylist is 200x slower than for x in myset. It’s the membership test that is much slower. Not the iteration. (Also for x in mydict is an iteration over keys not values, and so isn’t what we think of as an iteration on a dict’s ‘data’).Also the overall title “Python Numbers Every Programmer Should Know” starts with 20 numbers that are merely interesting.
That all said, the formatting is nice and engaging.
Author here.
Thanks for the feedback everyone. I appreciate your posting it @woodenchair and @aurornis for pointing out the intent of the article.
The idea of the article is NOT to suggest you should shave 0.5ns off by choosing some dramatically different algorithm or that you really need to optimize the heck out of everything.
In fact, I think a lot of what the numbers show is that over thinking the optimizations often isn't worth it (e.g. caching len(coll) into a variable rather than calling it over and over is less useful that it might seem conceptually).
Just write clean Python code. So much of it is way faster than you might have thought.
My goal was only to create a reference to what various operations cost to have a mental model.
Then you should have written that. Instead you have given more fodder for the premature optimization crowd.
This is really weird thing to worry about in python. But is also misleading; Python int is arbitrary precision, they can take up much more storage and arithmetic time depending in their value.
Python programmers don't need to know 85 different obscure performance numbers. Better to really understand ~7 general system performance numbers.
Nice numbers and it's always worth to know an order of magnitude. But these charts are far away from what "every programmer should know".
I think we can safely steelman the claim to "every Python programmer should know", and even from there, every "serious" Python programmer, writing Python professionally for some "important" reason, not just everyone who picks up Python for some scripting task. Obviously there's not much reason for a C# programmer to go try to memorize all these numbers.
Though IMHO it suffices just to know that "Python is 40-50x slower than C and is bad at using multiple CPUs" is not just some sort of anti-Python propaganda from haters, but a fairly reasonable engineering estimate. If you know that you don't really need that chart. If your task can tolerate that sort of performance, you're fine; if not, figure out early how you are going to solve that problem, be it through the several ways of binding faster code to Python, using PyPy, or by not using Python in the first place, whatever is appropriate for your use case.
I doubt list and string concatenation operate in constant time, or else they affect another benchmark. E.g., you can concatenate two lists in the same time, regardless of their size, but at the cost of slower access to the second one (or both).
More contentiously: don't fret too much over performance in Python. It's a slow language (except for some external libraries, but that's not the point of the OP).
String concatenation is mentioned twice on that page, with the same time given. The first time it has a parenthetical "(small)", the second time doesn't have it. I expect you were looking at the second one when you typed that as I would agree that you can't just label it as a constant time, but they do seem to have meant concatenating "small" strings, where the overhead of Python's object construction would dominate the cost of the construction of the combined string.
What would be the explanation for an int taking 28 bytes but a list of 1000 ints taking only 7.87KB?
That appears to be the size of the list itself, not including the objects it contains: 8 bytes per entry for the object pointer, and a kilo-to-kibi conversion. All Python values are "boxed", which is probably a more important thing for a Python programmer to know than most of these numbers.
The list of floats is larger, despite also being simply an array of 1000 8-byte pointers. I assume that it's because the int array is constructed from a range(), which has a __len__(), and therefore the list is allocated to exactly the required size; but the float array is constructed from a generator expression and is presumably dynamically grown as the generator runs and has a bit of free space at the end.
It was. I updated the results to include the contained elements. I also updated the float list creation to match the int list creation.
It's important to know that these numbers will vary based on what you're measuring, your hardware architecture, and how your particular Python binary was built.
For example, my M4 Max running Python 3.14.2 from Homebrew (built, not poured) takes 19.73MB of RAM to launch the REPL (running `python3` at a prompt).
The same Python version launched on the same system with a single invocation for `time.sleep()`[1] takes 11.70MB.
My Intel Mac running Python 3.14.2 from Homebrew (poured) takes 37.22MB of RAM to launch the REPL and 9.48MB for `time.sleep`.
My number for "how much memory it's using" comes from running `ps auxw | grep python`, taking the value of the resident set size (RSS column), and dividing by 1,024.
1: python3 -c 'from time import sleep; sleep(100)'
Why? If those micro benchmarks mattered in your domain, you wouldn't be using python.
That's an "all or nothing" fallacy. Just because you use Python and are OK with some slowdown, doesn't mean you're OK with each and every slowdown when you can do better.
To use a trivial example, using a set instead of a list to check membership is a very basic replacement, and can dramatically improve your running time in Python. Just because you use Python doesn't mean anything goes regarding performance.
That's an example of an algorithmic improvement (log n vs n), not a micro benchmark, Mr. Fallacy.
> Numbers are surprisingly large in Python
Makes me wonder if the cpython devs have ever considered v8-like NaN-boxing or pointer stuffing.
I have some questions and requests for clarification/suspicious behavior I noticed after reviewing the results and the benchmark code, specifically:
- If slotted attribute reads and regular attribute reads are the same latency, I suspect that either the regular class may not have enough "bells on" (inheritance/metaprogramming/dunder overriding/etc) to defeat simple optimizations that cache away attribute access, thus making it equivalent in speed to slotted classes. I know that over time slotting will become less of a performance boost, but--and this is just my intuition and I may well be wrong--I don't get the impression that we're there yet.
- Similarly "read from @property" seems suspiciously fast to me. Even with descriptor-protocol awareness in the class lookup cache, the overhead of calling a method seems surprisingly similar to the overhead of accessing a field. That might be explained away by the fact that property descriptors' "get" methods are guaranteed to be the simplest and easiest to optimize of all call forms (bound method, guaranteed to never be any parameters), and so the overhead of setting up the stack/frame/args may be substantially minimized...but that would only be true if the property's method body was "return 1" or something very fast. The properties tested for these benchmarks, though, are looking up other fields on the class, so I'd expect them to be a lot slower than field access, not just a little slower (https://github.com/mikeckennedy/python-numbers-everyone-shou...).
- On the topic of "access fields of objects" (properties/dataclasses/slots/MRO/etc.), benchmarks are really hard to interpret--not just these benchmarks, all of them I've seen. That's because there are fundamentally two operations involved: resolving a field to something that produces data for it, and then accessing the data. For example, a @property is in a class's method cache, so resolving "instance.propname" is done at the speed of the methcache. That might be faster than accessing "instance.attribute" (a field, not a @property or other descriptor), depending on the inheritance geometry in play, slots, __getattr[ibute]__ overrides, and so on. On the other hand, accessing the data at "instance.propname" is going to be a lot more expensive for most @properties (because they need to call a function, use an argument stack, and usually perform other attribute lookups/call other functions/manipulate locals, etc); accessing data at "instance.attribute" is going to be fast and constant-time--one or two pointer-chases away at most.
- Nitty: why's pickling under file I/O? Those benchmarks aren't timing pickle functions that perform IO, they're benchmarking the ser/de functionality and thus should be grouped with json/pydantic/friends above.
- Asyncio's no spring chicken, but I think a lot of the benchmarks listed tell a worse story than necessary, because they don't distinguish between coroutines, Tasks, and Futures. Coroutines are cheap to have and call, but Tasks and Futures have a little more overhead when they're used (even fast CFutures) and a lot more overhead to construct since they need a lot more data resources than just a generator function (which is kinda what a raw coroutine desugars to, but that's not as true as most people think it is...another story for another time). Now, "run_until_complete{}" and "gather()" initially take their arguments and coerce them into Tasks/Futures--that detection, coercion, and construction takes time and consumes a lot of overhead. That's good to know (since many people are paying that coercion tax unknowingly), but it muddies the boundary between "overhead of waiting for an asyncio operation to complete" and "overhead of starting an asyncio operation". Either calling the lower-level functions that run_until_complete()/gather() use internally, or else separating out benchmarks into ones that pass Futures/Tasks/regular coroutines might be appropriate.
- Benchmarking "asyncio.sleep(0)" as a means of determining the bare-minimum await time of a Python event loop is a bad idea. sleep(0) is very special (more details here: https://news.ycombinator.com/item?id=46056895) and not representative. To benchmark "time it takes for the event loop to spin once and produce a result"/the python equivalent of process.nextTick, it'd be better to use low-level loop methods like "call_soon" or defer completion to a Task and await that.
Initially I thought how efficient strings are... but then I understood how inefficient arithmetic is. Interesting comparison but exact speed and IO depend on a lot of things, and unlikely one uses Mac mini in production so these numbers definitely aren't representative.
Great reference overall, but some of these will diverge in practice: 141 bytes for a 100 char string won’t hold for non-ASCII strings for example, and will change if/when the object header overhead changes.
int is larger than float, but list of floats is larger than list of ints
Then again, if you're worried about any of the numbers in this article maybe you shouldn't be using Python at all. I joke, but please do at least use Numba or Numpy so you aren't paying huge overheads for making an object of every little datum.
tfa mentions running benchmark on a multi-core platform, but doesn't mention if benchmark results used multithreading.. a brief look at the code suggests not
Yeah... No. I've 10+ years of python under my belt and I might have had need for this kind of micro optimizations in like 2 times most
Sorry, you’re not allowed to discourage premature optimization or defend Python here.
This is AI slop.