Language models generally try to compute the probability of a word wtwt given its n−1n−1 previous words, i.e. p(wt|wt−1,⋯wt−n+1)p(wt|wt−1,⋯wt−n+1).
comments powered by Disqus
Language models generally try to compute the probability of a word wtwt given its n−1n−1 previous words, i.e. p(wt|wt−1,⋯wt−n+1)p(wt|wt−1,⋯wt−n+1).