Bifurcated Neural Network#

The neural network method offers a deep learning-based approach to constructing house price indices. It automatically learns complex, nonlinear relationships between property features and sale prices, enabling it to separate individual property effects from broader market trends. While it requires large volumes of sales and property data, it is particularly powerful at scale and can produce accurate, fine-grained, and timely indices without relying on repeat sales.

Note

Background on model construction for the neural network method can be found in:

Krause and Johnson (2024), “A Multi-Criteria Evaluation of House Price Indexes”. andykrause/hpi_research.

Overview#

The neural network method:

  1. Separates property-specific and market-level effects

  2. Learns nonlinear relationships automatically

  3. Can handle high-dimensional feature spaces

  4. Supports both local and global market patterns

The method supports two distinct approaches for extracting the house price index:

Residual Approach

Attributional Approach

Description

Extracts the index directly from the market pathway by isolating temporal market effects. Default in hpiPy.

Uses explainability techniques (e.g., SHAP) to attribute prediction components to market-level factors.

Steps

  1. Train on full feature set

  2. Zero out non-market (i.e., property-specific) features

  3. Extract residual market effects

  4. Convert to index

  1. Train on full feature set

  2. Apply explainability techniques to assign feature attributions

  3. Extract market-level attributions

  4. Convert to index

Benefits

  • Explicit separation of effects

  • Structural interpretation

  • Granular effect decomposition

  • Feature-level interpretability

Drawbacks

  • Less flexible than attributional

  • Relies on post-hoc attributions

Residual Approach

The residual approach decomposes the logarithm of a property’s value into a market-level price index and a property-specific component. This reflects the idea that housing value is jointly determined by market conditions and the characteristics of the property itself (Clapham et al.):

\[\log V_{it} = \log P_t + \log Q_i + \varepsilon_{it}\]

where:

  • \(V_{it}\) is the observed transaction price (or value) of property i at time t.

  • \(P_t\) is the market-level price index at time t, common to all properties.

  • \(Q_i\) is the time-invariant quality or quantity of property i (e.g., structural/locational attributes).

  • \(\varepsilon_{it}\) is a residual term capturing idiosyncratic noise or omitted effects.

This model is conceptually similar to hedonic or repeat-sales approaches, where market effects and property characteristics are disentangled. The index is derived by directly extracting the market-level estimates for each time period from the model and normalizing them to a base period.

Attributional Approach

The attributional approach models the house price as a black-box prediction that integrates market and property factors, and then uses explainability methods to decompose this prediction into attributions. Specifically, DeepLIFT attributes the model output to individual features relative to a reference (baseline) input (Shrikumar et al.):

\[\hat{V}_i = f(x_i)\]
\[\Delta \hat{V}_i = \hat{V}_i - \hat{V}_i^{\text{ref}} = \sum_{j} C_j\]
\[C_j = m_j \cdot \Delta x_{ij}\]

where:

  • \(\hat{V}_i = f(x_i)\) is the model’s predicted value for property i.

  • \(x_i\) is the feature vector describing property i (e.g., square footage, year built, etc.).

  • \(\hat{V}_i^{\text{ref}} = f(x_i^{\text{ref}})\) is the prediction for a baseline (e.g., average, median, or zeroed) property.

  • \(\Delta \hat{V}_i\) is the total difference in predicted value from the baseline.

  • \(C_j\) is the contribution of feature \(j\), computed as the product of the feature’s difference from baseline, \(\Delta x_{ij} = x_{ij} - x_{j}^{\text{ref}}\), and its multiplier \(m_j\), which represents the sensitivity of the output to that feature.

This approach allows for interpretability of complex nonlinear models by expressing the prediction in terms of feature-level contributions. The index is derived by estimating the attributions for the temporal features at each time period and normalizing them to a base period.

Data Preparation#

Required data structure:

  • A date column (e.g., “sale_date”)

  • A price column (e.g., “sale_price”)

  • Property characteristics

  • A transaction identifier

Example setup:

>>> from hpipy.datasets import load_ex_sales
>>> from hpipy.period_table import PeriodTable
>>> from hpipy.trans_data import HedonicTransactionData

# Load sales data.
>>> df = load_ex_sales()

# Create period table.
>>> sales_hdata = PeriodTable(df).create_period_table(
...     "sale_date",
...     periodicity="monthly",
... )

# Prepare hedonic data.
>>> trans_data = HedonicTransactionData(sales_hdata).create_transactions(
...     prop_id="pinx",
...     trans_id="sale_id",
...     price="sale_price",
... )

Creating the Index#

Create a neural network-based index using either approach:

>>> from hpipy.extensions import NeuralNetworkIndex

>>> kwargs = {
...     "prop_id": "pinx",
...     "trans_id": "sale_id",
...     "price": "sale_price",
...     "date": "sale_date",
...     "dep_var": "price",
...     "ind_var": ["tot_sf", "beds", "baths"],
...     "feature_dict": {
...         "numerics": [],
...         "log_numerics": ["tot_sf"],
...         "categoricals": [],
...         "ordinals": ["beds", "baths"],
...         "hpi": ["sale_date"],
...     },
...     "preprocess_geo": False,
...     "random_seed": 0,
... }

# Create index using residual approach (default).
>>> hpi_residual = NeuralNetworkIndex.create_index(
...     trans_data=trans_data,
...     estimator="residual",  # default
...     **kwargs,
... )

# Create index using attributional approach.
>>> hpi_attributional = NeuralNetworkIndex.create_index(
...     trans_data=trans_data,
...     estimator="attributional",
...     **kwargs,
... )

Parameters#

The main parameters for neural network index creation are:

Parameters

dep_varstr

Dependent variable to model.

ind_varlist

Independent variables to use in the model.

estimatorstr

Estimator type. Choose between:

  • “residual”: Extracts index from market pathway (default)

  • “attributional”: Derives index through explainability analysis

feature_dictdict

Feature dictionary specifying how different variables should be processed:

  • numerics: Standard numeric features

  • log_numerics: Features to be log-transformed

  • categoricals: Categorical features for embedding

  • ordinals: Ordinal features

  • hpi: Temporal features for index generation

num_modelsint

Number of models to train in ensemble.

num_epochsint

Number of training epochs.

batch_sizeint

Batch size for training.

hidden_dimslist

List of integers specifying the number of neurons in each hidden layer.

emb_sizeint

Embedding size for categorical features.

dropout_ratefloat

Dropout rate for regularization (0 to 1).

learning_ratefloat

Learning rate for optimization.

Evaluating the Index#

Evaluate the neural network index using various metrics:

>>> import altair as alt
>>> from hpipy.utils.metrics import volatility
>>> from hpipy.utils.plotting import plot_index

# Calculate metrics.
>>> vol_residual = volatility(hpi_residual)
>>> vol_attributional = volatility(hpi_attributional)

# Visualize the index.
>>> alt.layer(
...     (
...         plot_index(hpi_residual)
...         .transform_calculate(method="'Residual'")
...         .encode(color=alt.Color("method:N", title="Method"))
...     ),
...     (
...         plot_index(hpi_attributional)
...         .transform_calculate(method="'Attributional'")
...         .encode(color=alt.Color("method:N", title="Method"))
...     ),
... ).properties(title="Neural Network Index")
alt.LayerChart(...)