I haven't found an explanation yet that answers a couple of seemingly basic questions about the LLMs:
What does the input side of the neutral network look like? Is it enough bits to represent N tokens where N is the context size? How does it handle inputs that are shorter than the context size?
I think embedding is one of the more interesting concepts behind LLMs but most pages treat it as a side note. How does embedding treat tokens that can have vastly different meanings in different contexts - if the word "bank" were a single token, for example, how does embedding account for the fact that it can mean river bank or money bank? Do the elements of the vector point in both directions? And how exactly does embedding interact with the training and inference processes - does inference generate updated embeddings at any point or are they fixed at training time?
(Training vs inference time is another thing explanations are usually frustrating vague on)
I’ve had a look, and it’s very well explained! If you ever want to expand it, you could also add how embedded data is fed at the very final step for specific tasks, and how it can affect prediction results.
I haven't found an explanation yet that answers a couple of seemingly basic questions about the LLMs:
What does the input side of the neutral network look like? Is it enough bits to represent N tokens where N is the context size? How does it handle inputs that are shorter than the context size?
I think embedding is one of the more interesting concepts behind LLMs but most pages treat it as a side note. How does embedding treat tokens that can have vastly different meanings in different contexts - if the word "bank" were a single token, for example, how does embedding account for the fact that it can mean river bank or money bank? Do the elements of the vector point in both directions? And how exactly does embedding interact with the training and inference processes - does inference generate updated embeddings at any point or are they fixed at training time?
(Training vs inference time is another thing explanations are usually frustrating vague on)
Page keeps annoyingly scroll-jumping a few pixels on iOS safari
I’ve had a look, and it’s very well explained! If you ever want to expand it, you could also add how embedded data is fed at the very final step for specific tasks, and how it can affect prediction results.