Why didn't this author compare Llama 3 with GLM 5.2 (released 1 week ago) which is a more standard attention based LLM? To compare 2 separate families of LLMs and then pointing out that they are different is not a surprising result and detracts from the point the author is trying to make.
If you look at it, the diagrams are very similar, but the main differences are that the feedforward is replaced with a MoE (router to multiple feedforwards) and the model has a different attention implementation.
I am a professional writer and have been for over 30 years. (I do not use any form of LLM ever.) This means I read a lot. This also means that I have 30+ years of experience of readers not understanding what I wrote, or not getting further than the title, or not getting the main message, or inverting it in their heads, or inserting their own message and then complaining when I diverge, and an endless list of Ways People Do Not Get It.
I am also a trained TESOL teacher. Ability to capture gist is a skill we test for and measure, and many, maybe the majority, of native speakers don't have it and don't know.
In recent years I constantly see people going "this is written by AI" and I have yet to see a single of of them able to coherently prove their point. It's all just feelings and hunches.
So I am calling you on this:
How do you know? Show your working. Demonstrate your case.
Why didn't this author compare Llama 3 with GLM 5.2 (released 1 week ago) which is a more standard attention based LLM? To compare 2 separate families of LLMs and then pointing out that they are different is not a surprising result and detracts from the point the author is trying to make.
https://sebastianraschka.com/llm-architecture-gallery/?compa...
If you look at it, the diagrams are very similar, but the main differences are that the feedforward is replaced with a MoE (router to multiple feedforwards) and the model has a different attention implementation.
It’s written by AI.
[[citation needed]]
I am a professional writer and have been for over 30 years. (I do not use any form of LLM ever.) This means I read a lot. This also means that I have 30+ years of experience of readers not understanding what I wrote, or not getting further than the title, or not getting the main message, or inverting it in their heads, or inserting their own message and then complaining when I diverge, and an endless list of Ways People Do Not Get It.
I am also a trained TESOL teacher. Ability to capture gist is a skill we test for and measure, and many, maybe the majority, of native speakers don't have it and don't know.
In recent years I constantly see people going "this is written by AI" and I have yet to see a single of of them able to coherently prove their point. It's all just feelings and hunches.
So I am calling you on this:
How do you know? Show your working. Demonstrate your case.