{"id":23,"date":"2024-11-07T19:40:09","date_gmt":"2024-11-07T19:40:09","guid":{"rendered":"https:\/\/shaheenahmedc.com\/?page_id=23"},"modified":"2026-04-29T14:50:12","modified_gmt":"2026-04-29T14:50:12","slug":"high-dimensional-bayesian-optimisation-on-expensive-simulators","status":"publish","type":"page","link":"https:\/\/shaheenahmedc.com\/?page_id=23","title":{"rendered":"High-dimensional Bayesian Optimisation on Expensive Simulators"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">MSc Thesis<\/h2>\n\n\n\n<p>If you build a computationally expensive simulation (of entire economies, climate systems, brains, disease spread, galaxies, text), and it has a number of hyperparameters which you need to tune (interest rates, number of synaptic connections per neuron, vaccination rates, the Hubble constant \\(H_0\\), number of attention heads), how do you do this? <\/p>\n\n\n\n<p>The volume of your parameter space will grow exponentially with every hyperparameter. Say you want to test at least 10 values for each hyperparameter, across 10 parameters. If you want to explore every combination, that&#8217;s immediately \\( 10^{10} \\) (10 billion)!<\/p>\n\n\n\n<p>On top of this, if one run of your simulation takes 100 seconds (which seems modest), the total runtime will be \\( 10^{10} * 100 \\) seconds, or, in other words, 31,710 years! <\/p>\n\n\n\n<p>Of course, we might be able to distribute these runs across compute nodes, having access to \\( 10^6 \\) nodes would reduce the runtime to around 12 days. But, this would be, in most situations, prohibitively expensive. <\/p>\n\n\n\n<p>Can we be smarter about how we explore this parameter space? Do we need all \\( 10^{10} \\) samples, or can we get an acceptable loss value with fewer? <\/p>\n\n\n\n<p>This is the game Bayesian Optimisation plays, and this is what I chose to write my MSc thesis on, in the context of expensive, high-dimensional macro-economic simulation models. In particular, a flavour of simulations known as agent-based modelling (ABM), which is typically described in this agent-environment lingo, but I like to just think of it as little computer people\/animals\/bugs running around a computer world, bumping into things and each other, buying stuff, eating stuff, running away from and towards each other (in sometimes <a href=\"https:\/\/www.youtube.com\/watch?v=xPaEXsFDah4\">beautiful fashion<\/a>). <\/p>\n\n\n\n<p>I was particularly interested in how we could use Bayesian Optimisation to efficiently calibrate expensive macro-economic ABMs, which often had dozens of hyperparameters. <\/p>\n\n\n\n<p>Within the thesis, I:<br><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Set up my notational mathematical world<\/li>\n\n\n\n<li>Filled this with a few toy macro-economic ABMs<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"805\" height=\"489\" src=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-1.png\" alt=\"\" class=\"wp-image-35\" srcset=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-1.png 805w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-1-300x182.png 300w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-1-768x467.png 768w\" sizes=\"auto, (max-width: 805px) 100vw, 805px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Pinned down some different SOTA calibration schemes <\/li>\n\n\n\n<li>Used them on the toy models in question<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"815\" height=\"709\" src=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-2.png\" alt=\"\" class=\"wp-image-36\" srcset=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-2.png 815w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-2-300x261.png 300w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-2-768x668.png 768w\" sizes=\"auto, (max-width: 815px) 100vw, 815px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"815\" height=\"499\" src=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-3.png\" alt=\"\" class=\"wp-image-37\" srcset=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-3.png 815w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-3-300x184.png 300w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-3-768x470.png 768w\" sizes=\"auto, (max-width: 815px) 100vw, 815px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Then, starting from the humble Gaussian, moving through multivariate Normal distributions and into covariance matrices, I build up an explanation of Gaussian Processes.<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"804\" height=\"530\" src=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-4.png\" alt=\"\" class=\"wp-image-38\" srcset=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-4.png 804w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-4-300x198.png 300w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-4-768x506.png 768w\" sizes=\"auto, (max-width: 804px) 100vw, 804px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"821\" height=\"489\" src=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-5.png\" alt=\"\" class=\"wp-image-39\" srcset=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-5.png 821w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-5-300x179.png 300w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-5-768x457.png 768w\" sizes=\"auto, (max-width: 821px) 100vw, 821px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"826\" height=\"596\" src=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-6.png\" alt=\"\" class=\"wp-image-40\" srcset=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-6.png 826w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-6-300x216.png 300w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-6-768x554.png 768w\" sizes=\"auto, (max-width: 826px) 100vw, 826px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We layer on the magic of conditioning and acquisition functions, before applying the now fully-cooked Bayesian Optimisation calibration schemes to our toy ABMs. <\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"859\" height=\"660\" src=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-7.png\" alt=\"\" class=\"wp-image-41\" srcset=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-7.png 859w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-7-300x231.png 300w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-7-768x590.png 768w\" sizes=\"auto, (max-width: 859px) 100vw, 859px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"822\" height=\"571\" src=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-8.png\" alt=\"\" class=\"wp-image-42\" srcset=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-8.png 822w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-8-300x208.png 300w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-8-768x533.png 768w\" sizes=\"auto, (max-width: 822px) 100vw, 822px\" \/><\/figure>\n\n\n\n<ul class=\"wp-block-list\">\n<li>I then discuss my main (attempted) innovation: in the most complex macro-economic ABMs, a subset of the hyperparameters are typically responsible for most of the variance. <\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"830\" height=\"603\" src=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-9.png\" alt=\"\" class=\"wp-image-43\" srcset=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-9.png 830w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-9-300x218.png 300w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-9-768x558.png 768w\" sizes=\"auto, (max-width: 830px) 100vw, 830px\" \/><\/figure>\n\n\n\n<p>This is a sub-field of high-dimensional Bayesian Optimisation, where various schemes are cooked up to project the entire space to one of lower dimensionality, and conduct the Bayesian Optimisation there, before projecting back up. I adapted some SOTA literature on macro-economic ABM calibration to use one of these methods, tried to combine it with some SOTA <a href=\"https:\/\/arxiv.org\/abs\/2001.11659\">NeurIPS literature<\/a> on the above (<a href=\"https:\/\/github.com\/shaheenahmedc\/ALEBO_BOTorch\/blob\/main\/ALEBO%20re-implementation%20-%20BOTorch%20.ipynb\">re-implementing it<\/a> in <a href=\"https:\/\/github.com\/pytorch\/botorch\">BOTorch<\/a>), with mixed results:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"819\" height=\"704\" src=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-10.png\" alt=\"\" class=\"wp-image-44\" srcset=\"https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-10.png 819w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-10-300x258.png 300w, https:\/\/shaheenahmedc.com\/wp-content\/uploads\/2024\/11\/image-10-768x660.png 768w\" sizes=\"auto, (max-width: 819px) 100vw, 819px\" \/><\/figure>\n\n\n\n<p>All in all, a project that I really enjoyed, which delivered a <a href=\"https:\/\/github.com\/shaheenahmedc\/Emulation_ABMs\/blob\/main\/Shaheen_Thesis_Draft%20-%206%20Oct%202021.pdf\">document<\/a> that I sometimes look back on with great fondness.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>MSc Thesis If you build a computationally expensive simulation (of entire economies, climate systems, brains, disease spread, galaxies, text), and it has a number of hyperparameters which you need to tune (interest rates, number of synaptic connections per neuron, vaccination rates, the Hubble constant \\(H_0\\), number of attention heads), how do you do this? The [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-23","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/shaheenahmedc.com\/index.php?rest_route=\/wp\/v2\/pages\/23","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/shaheenahmedc.com\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/shaheenahmedc.com\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/shaheenahmedc.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/shaheenahmedc.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=23"}],"version-history":[{"count":5,"href":"https:\/\/shaheenahmedc.com\/index.php?rest_route=\/wp\/v2\/pages\/23\/revisions"}],"predecessor-version":[{"id":53,"href":"https:\/\/shaheenahmedc.com\/index.php?rest_route=\/wp\/v2\/pages\/23\/revisions\/53"}],"wp:attachment":[{"href":"https:\/\/shaheenahmedc.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=23"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}