Perhaps it was due to English not being my primary language, but it took me an embarrassing amount of time to learn that probability and likelihood are different concepts. Concretely, we talk about probability of observing a data given an underlying assumption (model) is true while we talk about the likelihood of the model being true given we observe some data.
Yeah, it was a poor choice of nomenclature, since, in common, nontechnical parlance, "probable" and "likely" are very close semantically. Though I'm not sure which came first, the choice of "likelihood" for the mathematical concept or the casual use of "likely" as more or less synonymous with probable.
But the article makes it crystal clear (I had never seen it explained so clearly!):
"For conditional probability, the hypothesis is treated as a given, and the data are free to vary. For likelihood, the data are treated as a given, and the hypothesis varies."
The likelihood function returns a probability. Specifically it tells you, for some parametric model, how the joint probability of the data in your data set varies as a function of changing the parameters in the model.
If that sentence doesn't make sense, then it's helpful to just write out the likelihood function. You will notice that that it is in fact just the joint probability density of your model.
The only thing that makes it a "likelihood function" is that you fix the data and vary the parameters, whereas normally probability is a function of the data.
The way I read this (and my layman understading) probability is about predicting future events, given a known model or assumption, while likelihood is almost the mirror image: I observe the data/outcome, and I ask: given that data, how plausible is a certain model or parameter value?
Another way of looking at it:
probability: parameter/model is fixed; outcome is random/unknown.
likelihood: outcome/data is fixed; we vary the parameter/model to assess how well it explains the data.
It's actually almost exactly the other way around.
The probability of a model M given data X, or P(M|X) is the posterior probability.
The likelihood of data X given model M, or P(X|M), is the probability (or probability density, depending on whether your data is continuous or discrete) of observing data X given model M. We often are given un-normalised likelihoods, which is what the linked paper talks about. These quantities are related via Bayes' Theorem.
Now, you may ask, isn't the probability of observing data X given model M still a probability? I mean, yeah, a properly normalized likelihood is indeed a probability. It's not the mirror image of probability, it is just an un-normalised probability (or a probability distribution) of data given a model or model parameters.
Perhaps it was due to English not being my primary language, but it took me an embarrassing amount of time to learn that probability and likelihood are different concepts. Concretely, we talk about probability of observing a data given an underlying assumption (model) is true while we talk about the likelihood of the model being true given we observe some data.
Yeah, it was a poor choice of nomenclature, since, in common, nontechnical parlance, "probable" and "likely" are very close semantically. Though I'm not sure which came first, the choice of "likelihood" for the mathematical concept or the casual use of "likely" as more or less synonymous with probable.
My guess was always that "probability" came first, and they needed a different word for "likelihood" when the latter concept became formalized.
But the article makes it crystal clear (I had never seen it explained so clearly!):
"For conditional probability, the hypothesis is treated as a given, and the data are free to vary. For likelihood, the data are treated as a given, and the hypothesis varies."
Nah, that’s not a non native English thing, i think non maths background native people would make the same mistake
I’m native speaker and I thought they were the same. Still unsure of the difference. I guess I need to study this.
The likelihood function returns a probability. Specifically it tells you, for some parametric model, how the joint probability of the data in your data set varies as a function of changing the parameters in the model.
If that sentence doesn't make sense, then it's helpful to just write out the likelihood function. You will notice that that it is in fact just the joint probability density of your model.
The only thing that makes it a "likelihood function" is that you fix the data and vary the parameters, whereas normally probability is a function of the data.
The way I read this (and my layman understading) probability is about predicting future events, given a known model or assumption, while likelihood is almost the mirror image: I observe the data/outcome, and I ask: given that data, how plausible is a certain model or parameter value?
Another way of looking at it:
probability: parameter/model is fixed; outcome is random/unknown.
likelihood: outcome/data is fixed; we vary the parameter/model to assess how well it explains the data.
It's actually almost exactly the other way around.
The probability of a model M given data X, or P(M|X) is the posterior probability. The likelihood of data X given model M, or P(X|M), is the probability (or probability density, depending on whether your data is continuous or discrete) of observing data X given model M. We often are given un-normalised likelihoods, which is what the linked paper talks about. These quantities are related via Bayes' Theorem.
Now, you may ask, isn't the probability of observing data X given model M still a probability? I mean, yeah, a properly normalized likelihood is indeed a probability. It's not the mirror image of probability, it is just an un-normalised probability (or a probability distribution) of data given a model or model parameters.