{"id":7699,"date":"2026-04-19T14:08:38","date_gmt":"2026-04-19T13:08:38","guid":{"rendered":"https:\/\/sinatootoonian.com\/?post_type=notes&#038;p=7699"},"modified":"2026-04-19T14:24:17","modified_gmt":"2026-04-19T13:24:17","slug":"transformers","status":"publish","type":"notes","link":"https:\/\/sinatootoonian.com\/index.php\/notes\/transformers\/","title":{"rendered":"Transformers"},"content":{"rendered":"\n<p>We&#8217;ll follow the presenation in Chapter 12 of Bishop.<\/p>\n\n\n\n<p>A transformer is so called because it transformers an input set of <em>tokens<\/em> into an output set of the same size. $$ \\XX \\to \\wt \\XX,$$ where $\\XX$ and $\\wt{\\XX}$ have $N$ rows, one for each input and output $D$-dimensional token.<\/p>\n\n\n\n<p>The tokens are transformed using <em>queries<\/em> to access <em>values<\/em> by <em>key<\/em>.<\/p>\n\n\n\n<p>These are linear transformations of the input tokens.<\/p>\n\n\n\n<p>The keys are $$ \\KK = \\XX \\WW_K.$$<\/p>\n\n\n\n<p>The queries are $$\\QQ = \\XX \\WW_Q.$$<\/p>\n\n\n\n<p>The values are $$\\VV = \\XX \\WW_V.$$ <\/p>\n\n\n\n<p>The queries are matched against keys using a dot-product, then softmaxed. The result is used to weight the values:<\/p>\n\n\n\n<p>\\begin{align*} \\wt{\\XX} &amp;= \\text{Softmax}\\left({\\QQ \\KK^T \\over \\sqrt{D}}\\right) \\VV\\\\ &amp;= \\text{Softmax}\\left({\\XX \\WW_Q \\WW_K^T \\XX^T\\over \\sqrt{D}}\\right)\\XX  \\WW_V.\\end{align*}<\/p>\n\n\n\n<p>The output above corresponds to a single attention head. Multi-head attention concatenates several such outputs $\\HH_1, \\HH_2, \\dots$ to produce its output according to $$ \\YY = [\\HH_1, \\HH_2,\\dots] \\WW_o.$$ <\/p>\n","protected":false},"featured_media":0,"template":"","class_list":["post-7699","notes","type-notes","status-publish","hentry"],"acf":[],"_links":{"self":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/notes\/7699","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/notes"}],"about":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/types\/notes"}],"wp:attachment":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/media?parent=7699"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}