BMS-Aned seminar
Department of Data Science Methods, Julius Center, University Medical Center Utrecht
2024-09-26
What is AI?
Artificial Intelligence is the branch of computer science that focuses on creating systems capable of performing tasks that typically require human intelligence. (Russell and Norvig 2020)
What is Machine Learning?
Machine Learning is a subset of artificial intelligence that involves the use of algorithms and statistical models to enable computers to perform tasks without explicit instructions. Instead, they rely on patterns and inference from data. (Samuel 1959)
What tasks can we perform with machine learning?
i | length | weight | sex |
---|---|---|---|
1 | 137 | 30 | boy |
2 | 122 | 24 | girl |
3 | 101 | 18 | girl |
… | … | … | … |
We typically assume these data are (i.i.d.) samples from some unknown distribution \(p(l,w,s)\):
\[l_i,w_i,s_i \sim p(l,w,s)\]
\[l_j,w_j,s_j \sim p_{\theta}(l,w,s)\]
task | |
---|---|
generation | \(l_j,w_j,s_j \sim p_{\theta}(l,w,s)\) |
use samples to learn model for conditional distribution \(p\) \[ l_j,w_j \sim p_{\theta}(l,w|s=\text{boy}) \]
task | |
---|---|
generation | \(l_j,w_j,s_j \sim p_{\theta}(l,w,s)\) |
conditional generation | \(l_j,w_j \sim p_{\theta}(l,w|s=\text{boy})\) |
use samples to learn model for conditional distribution \(p\) of one variable \[ s_j \sim p_{\theta}(s|l=l',w=w') \]
task | |
---|---|
generation | \(l_j,w_j,s_j \sim p_{\theta}(l,w,s)\) |
conditional generation | \(l_j,w_j \sim p_{\theta}(l,w|s=\text{boy})\) |
call this one variable outcome and - classify when majority of generated samples are of a certain class - or: have a model that outputs expected values \[ s_j = p_{\theta}(s|l=l',w=w') > 0.5 \]
task | |
---|---|
generation | \(l_j,w_j,s_j \sim p_{\theta}(l,w,s)\) |
conditional generation | \(l_j,w_j \sim p_{\theta}(l,w|s=\text{boy})\) |
discrimination | \(p_{\theta}(s|l=l_i,w=w_i) > 0.5\) |
\[y = \sum_{i=0}^5 x_i \beta_i\]
\[\begin{align} h_i &= w_{0i} + w_{1i} x_1 + \ldots \\ h_i &= g(h_i) \\ y &= \sum_{i=1}^3 h_i w_i \end{align}\]
\[L(\theta) = \sum_{i=1}^n \ell(y_i, f(x_i;\theta))\]
\[\nabla L(\theta) \approx \frac{1}{m} \sum_{i=1}^m \nabla \ell(y_i, f(x_i;\theta))\]
\[\theta_{t+1} = \theta_t - \alpha \nabla L(\theta)\]
Parameter counting is a bad proxy for model complexity in neural networks
Whereas in regression models, model complexity is well-captured by the number of parameters, this is not the case for neural networks.
\[\begin{align} \text{word}_1 &\sim p_{\text{chatGPT}}(\text{word}|\text{prompt}) \end{align}\]
Prompt=“Frank went to the bar and”
\[\begin{align} \color{green}{had} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and}) \end{align}\]
Prompt=“Frank went to the bar and”
\[\begin{align} \color{green}{had} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and})\\ \color{orange}{a} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{had}) \end{align}\]
Prompt=“Frank went to the bar and”
\[\begin{align} \color{green}{had} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and})\\ \color{orange}{a} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{had})\\ \color{red}{drink} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{had} \ \color{orange}{a}) \end{align}\]
Prompt=“Frank went to the bar and”
\[\begin{align} \color{green}{had} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and})\\ \color{orange}{a} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{had})\\ \color{red}{drink} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{had} \ \color{orange}{a})\\ \text{STOP} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{had} \ \color{orange}{a} \ \color{red}{drink}) \end{align}\]
Prompt=“Frank went to the bar and”
\[\begin{align} \color{green}{met} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and}) \end{align}\]
Prompt=“Frank went to the bar and”
\[\begin{align} \color{green}{met} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and})\\ \color{orange}{a} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{met}) \end{align}\]
Prompt=“Frank went to the bar and”
\[\begin{align} \color{green}{met} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and})\\ \color{orange}{a} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{met})\\ \color{red}{friend} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{met} \ \color{orange}{a}) \end{align}\]
Prompt=“Frank went to the bar and”
\[\begin{align} \color{green}{met} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and})\\ \color{orange}{a} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{met})\\ \color{red}{friend} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{met} \ \color{orange}{a})\\ \text{STOP} &\sim p_{\text{chatGPT}}(\text{word}|\text{Frank went to the bar and } \color{green}{met} \ \color{orange}{a} \ \color{red}{friend}) \end{align}\]
- more compute resources
- bigger data
- bigger models (enabled by data and compute)
©Wouter van Amsterdam — WvanAmsterdam — wvanamsterdam.com/talks