{"id":8049,"date":"2026-05-26T17:49:56","date_gmt":"2026-05-26T16:49:56","guid":{"rendered":"https:\/\/sinatootoonian.com\/?p=8049"},"modified":"2026-05-27T06:44:15","modified_gmt":"2026-05-27T05:44:15","slug":"crash-course-on-ito-calculus","status":"publish","type":"post","link":"https:\/\/sinatootoonian.com\/index.php\/2026\/05\/26\/crash-course-on-ito-calculus\/","title":{"rendered":"Crash Course on Ito Calculus"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><em>These are my notes on the Ito calculus as presented in Chapter 8 of Potter&#8217;s &#8220;A First Course in Random Matrix Theory.&#8221; I follow that chapter pretty closely, filling in some of the gaps in the presentation as I go.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"717\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-14-1024x717.png\" alt=\"\" class=\"wp-image-8177\" style=\"aspect-ratio:1.4282281045078449;width:530px;height:auto\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-14-1024x717.png 1024w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-14-300x210.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-14-768x538.png 768w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-14.png 1148w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Brownian Motion<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We start with Brownian motion process, $X_t$. This has drift $\\mu$, and diffusion variance $\\sigma^2$, so, $$ X_t \\sim \\mathcal{N}(\\mu t, \\sigma^2 t).$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The Normal distribution is <em>infinitely divisible<\/em>. This means we can break a variable distributed that way into a sum of independent contributions. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is intuitive for our variable above. We can think of its overall position $X_t$ as the sum of infinitesimal motions up to that point. In fact, that&#8217;s how the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Brownian_motion\">Brownian motion<\/a> was originally observed, the result of countless random, independent perturbations, of e.g. dust particles under a microscope.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To formalize this, we split the time interval $[0,t]$ into $N$ sections. We can now think of $X_t$ as the sum of perturbations in each interval: $$X_t = \\sum_{k=0}^{N-1} \\delta X_k.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Each perturbation will contribute its own portion of drift and diffusion, $$ \\delta X_k \\sim \\mathcal{N}(\\mu \\delta t, \\sigma^2 \\delta t).$$ We can write this more suggestively as $$ \\delta X_k = \\mu \\delta t + \\sigma \\mathcal{N}(0, \\delta t).$$ This equation makes it clear that each contribution has a non-random part, the drift, and a random diffusive part.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If we take the limit as $\\delta t \\to 0$, we get $$ dX_t = \\mu dt + \\sigma dB_t,$$ where $dB_t$ is a centered, infinitesimal Gaussian perturbation<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One subtlety that&#8217;s crucial to Ito&#8217;s approach is exactly when the perturbations are supposed to arrive. Ito assumes that the perturbations arrive at the beginning of the interval. This then means that the the perturbation at $t$ will be independent of the value $X_t$ before it. This is called <strong>Ito&#8217;s prescription.<\/strong><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Transforming by a function<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Now what happens when we transform through a function? Let&#8217;s apply a Taylor expansion to second order:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$dF = F(X + dX) &#8211; F(X) = F'(X) dX + {F^{&#8221;}(X) \\over 2} dX^2.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$F'(X) dX = F'(X) (\\mu\\, dt + \\sigma dB_t).$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$ dX^2 = \\mu^2 dt^2 + \\sigma^2 dB_t^2 + 2 \\mu \\sigma^2 dB_t dt.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If we expand this out as <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$ dX^2 = \\mu^2 dt^2 + \\left(\\sigma^2 dB_t^2 &#8211; \\sigma^2 dt\\right) + \\sigma^2 dt + 2 \\mu \\sigma^2 dB_t dt.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The first term is $O(dt^2)$, the last term is $O(dt^{3\/2})$. The second term has mean 0, but turns out to be of order $dt$, so we can&#8217;t ignore it in isolation. But we&#8217;ll be integrating such terms, and they&#8217;re all uncorrelated, so Ito showed that we can ignore them in the integration. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So, as long as we agree that we&#8217;re going to be integrating things,  we end up with $$ dX^2 \\to \\sigma^2 dt,$$ and our differential becomes $$ dF = F'(X) dX + \\underbrace{{\\sigma^2 \\over 2} F^{\\prime\\prime}(X) dt}_{\\text{Ito correction}}.$$ Notice the <em>Ito<\/em> <em>correction <\/em> relative to the ordinary (non-stochastic) case.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Example: Variance of Brownian Noise<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">Suppose we have a driftless Brownian noise process, $$ dX_t = \\sigma dB_t.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We expect the variances of $X_t$ to be $\\sigma^2 t$. Let&#8217;s see if we can derive that.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The variance of $X_t$ is $$\\var(X_t) = \\EE X_t^2 &#8211; (\\EE X_t)^2 = \\EE X_t^2.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So if we take $F(X) = X^2$, then $$ \\var(X_t) = \\EE F(X_t).$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Using Ito&#8217;s formula above, $$ dF_t = F'(X_t) dX_t + {\\sigma^2 \\over 2}F^{&#8221;}(X_t) dt = 2X_t dX_t + \\sigma^2 dt = 2\\sigma  X_t dB_t + \\sigma^2 dt.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We then get $$F_T = \\int_0^T  2\\sigma  X_t dB_t + \\sigma^2 dt.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">By Ito&#8217;s prescription, $dB_t$ is independent of $X_t$, so the first term integrates to zero, though it presumably has some fluctuations.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The second term gives $\\sigma^2 T.$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So, $ \\EE F_T = \\sigma^2 T,$ as expected.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Langevin Equation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To motivate the Langevin equation, let&#8217;s start with a process with increments are $$ dX_t = dB_t + F(X_t) dt.$$ This is a combination of Brownian motion with a perturbing force. Brownian motion will increase variance with time. This gives us variability that we can sculpt to produce various effects. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We might be interested in:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The stationary distribution of $X_t$, if one exists.<\/li>\n\n\n\n<li>Setting $F$ to achieve a desired stationary distribution.<\/li>\n\n\n\n<li>Sampling $X_t$ as a means of sampling from some target distribution.<\/li>\n\n\n\n<li>Computing averages against the target distribution empirically using the samples.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">It will be useful to parameterize $F(X_t)$ as the gradient of some potential, so $$ F(X_t) = -{1 \\over 2} V'(X_t).$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Now let&#8217;s see what happens if we compute some function $f$ of the particle. From Ito&#8217;s formula, we have<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">\\begin{align*} df(X_t) &amp;= f'(X) dX + {1 \\over 2}f^{&#8221;}(X) dt\\\\ &amp;= f'(X) dB_t &#8211; {1 \\over 2} f'(X) V'(X) dt + {1 \\over 2} f^{&#8221;}(X) dt \\end{align*}<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Taking expectations relative to the assumed stationary distribution $P(x)$ on both sides, $$ \\EE df(X_t) = \\EE f'(X) dB_t &#8211; {1 \\over 2} \\EE f&#8217; V&#8217; dt + {1\\over 2} \\EE f^{&#8221;} dt.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">At stationarity, we&#8217;ll assume that $df\/dt$ doesn&#8217;t change in expectation, so the first term is zero. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The second term is zero by Ito&#8217;s prescription. So we get $$ \\boxed{\\EE (f'(X) V'(X)) = \\EE f^{&#8221;}(X).}$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This is an interesting expression itself, and says that we can compute the expectation of the second derivative of any $f$ by averaging its first derivative against $V'(X)$.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Expanding out the expressions, we get $$ \\int f'(X) V'(X) p(X) dX = \\int f^{&#8221;}(X) p(X) dX.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">At this point, we want to re-express the right-hand side. Since $$ d (f'(x) p(x)) =f^{&#8221;} p(x) dx + f'(x) p'(x),$$ if we integrate over the whole domain, we get $$ \\left. f'(x) p(x) \\right|_{-\\infty}^\\infty = \\int f^{&#8221;} p(x) dx + \\int f'(x) p'(x) dx.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">If $f$ and $p$ are well behaved, then since $p(x)$ integrates to 1, it must go to 0 at either extreme, so we get $$ \\int f^{&#8221;} p(x) dx = &#8211; \\int f'(x) p'(x) dx.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Substituting that into our expression above, we get $$ \\int f'(X) V'(X) p(X) dX = -\\int f'(X) p'(X) dX.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Since this must be true for any $f(X)$, we get $$ V'(X) p(X) = &#8211; p'(X) \\implies V'(X) = &#8211; {p'(X) \\over p(X)} = &#8211; {d \\log p(X) \\over dX}.$$ So we arrive at $$ V(X) = &#8211; \\log p(X) + \\text{const},$$ or $$ p(X) = {1 \\over Z} e^{-V(X)}.$$ <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">So what this says is that if we have a stochastic process whose increments are $$\\boxed{ dX_t = dB_t &#8211; {1 \\over 2} V'(X_t) dt,}$$ so noisy gradient descent, then its stationary distribution will be the one given above. This is the <em>Langevin equation.<\/em><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">An application to neural coding<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The dynamics above can be used to show how neurons can perform inference. Setting $V(X)$ be the negative log posterior conditioned on some observations, neurons performing gradient ascent on that posterior, while subject to Brownian noise, would, asymptotically, generate samples from the posterior. For example, see<a href=\"https:\/\/proceedings.neurips.cc\/paper_files\/paper\/2014\/file\/604ca44b3dd6eaa7f3f74a71aaf0c596-Paper.pdf\"> this paper<\/a> by Hennequin et al., where they start their analysis by considering dynamics of the form<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"657\" height=\"151\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-1.png\" alt=\"\" class=\"wp-image-8111\" style=\"aspect-ratio:4.351464435146443;width:356px;height:auto\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-1.png 657w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-1-300x69.png 300w\" sizes=\"auto, (max-width: 657px) 100vw, 657px\" \/><\/figure>\n\n\n\n<h3 class=\"wp-block-heading\">The Ohrstein-Uhlenbeck Process<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">As an example we can determine the process whose stationary distribution is the standard Gaussian. In that case, $V(X) = X^2\/2$, so $$ dX_t = dB_t &#8211; {1 \\over 2} V'(X_t) = dB_t &#8211; {1 \\over 2} X_t.$$ This is the <em>Ohrstein-Uhlenbeck<\/em> process.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Another way to derive this process is to return to Brownian motion and notice that the variance there increases with time. My naive approach would have been to take the Brownian motion and scale it by a factor of $\\sqrt{t}$. This would have produced a process with constant variance, but it<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Requires knowing $t$.<\/li>\n\n\n\n<li>Blows up at $t = 0$.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Instead of applying such a <em>global<\/em> shrinkage, we can apply it locally, at each increment. In particular, we can say $$ X_{t+ dt} = {X_t \\over \\sqrt{1 + dt}} + dB_t.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To see that this does the right thing, we can track the variance: $$ \\var(X_{t+dt}) = {\\var(X_t) \\over 1 + dt} + dt, $$ so $$ \\var(X_{t+dt})(1 + dt) = \\var(X_t) + dt +dt^2.$$ Dropping the $dt^2$ term and rearranging, $$ {\\var(X_{t+dt}) &#8211; \\var(X_t) \\over dt} = 1 &#8211; \\var(X_{t+dt}),$$ which goes to $$ {d\\var(X)\\over dt} = 1 &#8211; \\var(X).$$ So the variance decays to unity, as desired.  <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example: Student&#8217;s t-Distribution<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The Student&#8217;s t-distribution can be parameterized as  <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"155\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-3-1024x155.png\" alt=\"\" class=\"wp-image-8114\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-3-1024x155.png 1024w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-3-300x45.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-3-768x116.png 768w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-3.png 1136w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The potential for this distribution satisfies $$ V(x) = &#8211; \\log P + \\text{const} = {\\mu + 1 \\over 2} \\log \\left(1 + {x^2 \\over \\mu}\\right).$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To determine the process that produces the above as its stationary distribution, we take the derivative of the potential, $$ V'(x) = {\\mu + 1 \\over 2} {\\mu \\over \\mu + x^2} {2 x \\over \\mu} = {\\mu + 1 \\over \\mu + x^2} x.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">From this we get $$ dX_t = dB_t &#8211; {1 \\over 2}  {\\mu + 1 \\over \\mu + X_t^2} X_t.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When $X_t$ is small, this behaves like an OU process, $$ dX_t \\approx dB_t &#8211; {1 \\over 2}{\\mu +1 \\over \\mu}X_t.$$ Actually, the restoring force is stronger than in the O-U process, so it pulls values towards zero more strongly.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">However, when $X_t$ is large, the corrective deviations are much smaller than for the OU process: $$ dX_t \\approx dB_t &#8211; {1 \\over 2}{\\mu+1 \\over X_t^2} X_t =dB_t &#8211; {1 \\over 2}{\\mu+1 \\over X_t}.$$ In fact, the larger $X$ gets, the free-er the diffusion becomes.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Note also that as $\\mu \\to \\infty$, the process approaches an OU process, which reflects how the corresponding Student&#8217;s t-distribution approaches the Gaussian.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Simulations<\/h3>\n\n\n\n<h4 class=\"wp-block-heading\">OU Process<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">First, let&#8217;s generate an OU process. Below I specify two functions for doing this: one that takes the increments perspective, and the other that takes the shrinkage one:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"200\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-4-1024x200.png\" alt=\"\" class=\"wp-image-8121\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-4-1024x200.png 1024w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-4-300x59.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-4-768x150.png 768w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-4.png 1095w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">We can sample a million points from each, dropping the first 1000 for burn in:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"740\" height=\"95\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-5.png\" alt=\"\" class=\"wp-image-8122\" style=\"aspect-ratio:7.790262172284645;width:465px;height:auto\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-5.png 740w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-5-300x39.png 300w\" sizes=\"auto, (max-width: 740px) 100vw, 740px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Overlaying the standard normal on the histogram of samples shows good agreement:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-large is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"554\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-6-1024x554.png\" alt=\"\" class=\"wp-image-8123\" style=\"aspect-ratio:1.8484781159742278;width:490px;height:auto\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-6-1024x554.png 1024w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-6-300x162.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-6-768x415.png 768w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-6.png 1058w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<h4 class=\"wp-block-heading\">Student&#8217;s t-Distribution<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">First we&#8217;ll set up the target distribution and its PDF:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"419\" height=\"270\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-7.png\" alt=\"\" class=\"wp-image-8125\" style=\"aspect-ratio:1.5517939814814814;width:281px;height:auto\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-7.png 419w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-7-300x193.png 300w\" sizes=\"auto, (max-width: 419px) 100vw, 419px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Next we&#8217;ll set up the increments:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"78\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-8-1024x78.png\" alt=\"\" class=\"wp-image-8126\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-8-1024x78.png 1024w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-8-300x23.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-8-768x59.png 768w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-8.png 1048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Next we&#8217;ll generate the samples. We don&#8217;t sample in one run, since the samples get stuck around high values. This is easy to understand from our discussion of how the increments behave, and how they get smaller when the values get larger. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Instead, we&#8217;ll do a bunch of runs from random starting positions, and pool the results:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"55\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-9-1024x55.png\" alt=\"\" class=\"wp-image-8127\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-9-1024x55.png 1024w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-9-300x16.png 300w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-9-768x41.png 768w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-9.png 1078w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Plotting the desired distribution over the histogram shows good agreement:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"756\" height=\"574\" src=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-11.png\" alt=\"\" class=\"wp-image-8129\" style=\"aspect-ratio:1.3171225937183384;width:439px;height:auto\" srcset=\"https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-11.png 756w, https:\/\/sinatootoonian.com\/wp-content\/uploads\/2026\/05\/image-11-300x228.png 300w\" sizes=\"auto, (max-width: 756px) 100vw, 756px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">Fokker-Planck Equation<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We can also follow how the probability distribution $P$ itself evolves with time.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We start with our process $$dX_t = dB_t + F(X_t) dt,$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">and take some arbitrary well-behaved function $f$ and apply Ito&#8217;s formula to study its increments: $$ df(X_t) = f'(X_t) dX_t + {1 \\over 2}f^{&#8221;} dt = f'(X_t) dB_t + f'(X_t) F(X_t) dt + {1 \\over 2} f^{&#8221;} dt.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Taking expectations of this, $$ \\EE df(X_t) = d\\EE f(X_t) = \\EE f'(X_t) dB_t + \\EE f'(X_t) F(X_t) dt + {1 \\over 2} \\EE f^{&#8221;} dt.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">By Ito&#8217;s prescription the first term on the right-hand side is zero, so we get $$ d\\EE f(X_t) = \\EE f'(X_t) F(X_t) dt + {1 \\over 2} \\EE f^{&#8221;} dt.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Dividing out by $dt$ we get  \\begin{align*} {d\\EE f(X_t) \\over dt} &amp;= {d \\over dt} \\int f(X) P(X, t) dx\\\\ &amp;= \\int f(X) {\\partial P(X,t) \\over \\partial t} dx = \\int f'(X) F(X) P(X,t) dx + {1 \\over 2}\\int f^{&#8221;}(X) P(X,t) dx.\\end{align*}<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Applying integration by parts once to the first term, and twice to the second term, and using the boundary conditons on $P$ whereby it and its first derivative go to zero at infinity, we get $$ \\int f(X) {\\partial P(X,t) \\over \\partial t} dx = -\\int f(X) {\\partial F(X) P(X,t) \\over \\partial X} dx + {1 \\over 2}\\int f(X){\\partial^2  P(X,t) \\over \\partial X^2} dx.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Since this must be true for any $f(X)$, we get the <em>Fokker-Planck <\/em>equation<br>$$ \\boxed{ {\\partial P(X,t) \\over \\partial{t}} = &#8211; {\\partial F(X) P(X,t) \\over \\partial X} + {1 \\over 2}{\\partial^2  P(X,t) \\over \\partial X^2}.}$$<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Example: OU Process<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s check this equation for the OU process. For that process, $F(X) = -X\/2$. At stationarity, we should get from the above that $$ -{\\partial X P(X,t=\\infty) \\over \\partial X} = {\\partial^2  P(X,t=\\infty) \\over \\partial X^2}.$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For the standard Gaussian, $ P'(X) = -X P(X),$ so we see that this equation holds.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">$$\\blacksquare$$<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>These are my notes on the Ito calculus as presented in Chapter 8 of Potter&#8217;s &#8220;A First Course in Random Matrix Theory.&#8221;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,30],"tags":[],"class_list":["post-8049","post","type-post","status-publish","format-standard","hentry","category-blog","category-notes-blog"],"acf":[],"_links":{"self":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/8049","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/comments?post=8049"}],"version-history":[{"count":117,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/8049\/revisions"}],"predecessor-version":[{"id":8180,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/posts\/8049\/revisions\/8180"}],"wp:attachment":[{"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/media?parent=8049"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/categories?post=8049"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sinatootoonian.com\/index.php\/wp-json\/wp\/v2\/tags?post=8049"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}