{ "cells": [ { "cell_type": "markdown", "id": "0", "metadata": {}, "source": [ "# Example of model comparison with Harmonic: TOI-544\n", "\n", "In this notebook, we demonstrate how to use the [harmonic](https://astro-informatics.github.io/harmonic/) package to perform Bayesian model comparison for exoplanet radial velocity models. We compare a 1-planet model against a 2-planet model for the TOI-544 system. This system was first characterised in RV by [Osborne et al. (2024)](https://doi.org/10.1093/mnras/stad3837), who knew TOI-544~b was transiting, but suspected the presence of an outer non-transiting planet TOI-544~c.\n", "\n", "In Bayesian inference, the Bayesian evidence (marginal likelihood) $\\mathcal{Z} = \\int \\mathcal{L}(\\theta)\\, \\pi(\\theta)\\, d\\theta$ quantifies how well a model explains the data, integrating over the full prior volume. Unlike information criteria (AIC, BIC), it naturally penalises unnecessary model complexity through the prior -- rather than just through the likelihood and the dimensionality -- this is the Bayesian Occam's razor. In RV we tend to have quite informative priors (e.g. tight priors on $P$ from transits, or on quasiperiodic GP kernel hyperparameters through photometric studies), so it seems a waste to not use that prior information in our model comparisons by doing AIC or BIC instead.\n", "\n", "The problem though is that computing $\\mathcal{Z}$ directly is computationally intractable for high-dimensional problems. In MCMC, we only sample the likelihood and the prior functions, not the evidence (giving us the *un-normalised* posterior, which is fine for parameter inference, but need it to be normalised for model comparison). Therefore, we will use the `harmonic` package implementation of the Learned Harmonic Mean Estimator (LHME) which uses normalizing flows to estimate $\\mathcal{Z}$ from our existing posterior MCMC samples. See [McEwen et al. (2021)](https://doi.org/10.48550/arXiv.2111.12720) for the theoretical background, but the quick sell is that this is basically free - we already have the MCMC samples from the fitting/parameter inference anyway! We can use any sampling technique we like (such as `ravest`) and get evidences in five minutes.\n", "\n", "This tutorial assumes familiarity with fitting RV models in `ravest` so we'll skip over the fine details -- see the [K2-24 case-study](K2-24.ipynb) and [GP fitting tutorial](example_GP.ipynb) for more detailed walkthroughs." ] }, { "cell_type": "code", "execution_count": null, "id": "1", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "import harmonic as hm\n", "import jax\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "\n", "from ravest.fit import GPFitter\n", "from ravest.gp import GPKernel\n", "from ravest.param import Parameter, Parameterisation\n", "from ravest.prior import HalfNormal, Normal, Uniform\n", "\n", "jax.config.update(\"jax_enable_x64\", True)" ] }, { "cell_type": "markdown", "id": "2", "metadata": {}, "source": [ "## Data\n", "\n", "TOI-544 is an M dwarf observed with HARPS and HARPS-N. The data below have already been prepared: radial velocities are in m/s and times are in BTJD (BJD - 2457000). This data is based on RV data originally from [Osborne et al. (2024) | MNRAS 527, 11138–11157](https://doi.org/10.1093/mnras/stad3837)." ] }, { "cell_type": "code", "execution_count": null, "id": "3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Instruments in the dataset: ['HARPS' 'HARPS-N']\n" ] }, { "data": { "text/html": [ "
| \n", " | RV | \n", "e_RV | \n", "Instrument | \n", "BTJD | \n", "
|---|---|---|---|---|
| 0 | \n", "19.4 | \n", "1.7 | \n", "HARPS | \n", "2205.63053 | \n", "
| 1 | \n", "30.4 | \n", "1.4 | \n", "HARPS | \n", "2206.63247 | \n", "
| 2 | \n", "21.5 | \n", "1.9 | \n", "HARPS | \n", "2207.57568 | \n", "
| 3 | \n", "22.7 | \n", "1.3 | \n", "HARPS | \n", "2207.66046 | \n", "
| 4 | \n", "-2.1 | \n", "1.4 | \n", "HARPS | \n", "2214.71815 | \n", "