Bifurcated Neural Network#
The neural network method offers a deep learning-based approach to constructing house price indices. It automatically learns complex, nonlinear relationships between property features and sale prices, enabling it to separate individual property effects from broader market trends. While it requires large volumes of sales and property data, it is particularly powerful at scale and can produce accurate, fine-grained, and timely indices without relying on repeat sales.
Note
Background on model construction for the neural network method can be found in:
Krause and Johnson (2024), “A Multi-Criteria Evaluation of House Price Indexes”. andykrause/hpi_research.
Overview#
The neural network method:
Separates property-specific and market-level effects
Learns nonlinear relationships automatically
Can handle high-dimensional feature spaces
Supports both local and global market patterns
The method supports two distinct approaches for extracting the house price index:
Residual Approach |
Attributional Approach |
|
---|---|---|
Description |
Extracts the index directly from the market pathway by isolating temporal market effects. Default in hpiPy. |
Uses explainability techniques (e.g., SHAP) to attribute prediction components to market-level factors. |
Steps |
|
|
Benefits |
|
|
Drawbacks |
|
|
Residual Approach
The residual approach decomposes the logarithm of a property’s value into a market-level price index and a property-specific component. This reflects the idea that housing value is jointly determined by market conditions and the characteristics of the property itself (Clapham et al.):
where:
\(V_{it}\) is the observed transaction price (or value) of property i at time t.
\(P_t\) is the market-level price index at time t, common to all properties.
\(Q_i\) is the time-invariant quality or quantity of property i (e.g., structural/locational attributes).
\(\varepsilon_{it}\) is a residual term capturing idiosyncratic noise or omitted effects.
This model is conceptually similar to hedonic or repeat-sales approaches, where market effects and property characteristics are disentangled. The index is derived by directly extracting the market-level estimates for each time period from the model and normalizing them to a base period.
Attributional Approach
The attributional approach models the house price as a black-box prediction that integrates market and property factors, and then uses explainability methods to decompose this prediction into attributions. Specifically, DeepLIFT attributes the model output to individual features relative to a reference (baseline) input (Shrikumar et al.):
where:
\(\hat{V}_i = f(x_i)\) is the model’s predicted value for property i.
\(x_i\) is the feature vector describing property i (e.g., square footage, year built, etc.).
\(\hat{V}_i^{\text{ref}} = f(x_i^{\text{ref}})\) is the prediction for a baseline (e.g., average, median, or zeroed) property.
\(\Delta \hat{V}_i\) is the total difference in predicted value from the baseline.
\(C_j\) is the contribution of feature \(j\), computed as the product of the feature’s difference from baseline, \(\Delta x_{ij} = x_{ij} - x_{j}^{\text{ref}}\), and its multiplier \(m_j\), which represents the sensitivity of the output to that feature.
This approach allows for interpretability of complex nonlinear models by expressing the prediction in terms of feature-level contributions. The index is derived by estimating the attributions for the temporal features at each time period and normalizing them to a base period.
Data Preparation#
Required data structure:
A date column (e.g., “sale_date”)
A price column (e.g., “sale_price”)
Property characteristics
A transaction identifier
Example setup:
>>> from hpipy.datasets import load_ex_sales
>>> from hpipy.period_table import PeriodTable
>>> from hpipy.trans_data import HedonicTransactionData
# Load sales data.
>>> df = load_ex_sales()
# Create period table.
>>> sales_hdata = PeriodTable(df).create_period_table(
... "sale_date",
... periodicity="monthly",
... )
# Prepare hedonic data.
>>> trans_data = HedonicTransactionData(sales_hdata).create_transactions(
... prop_id="pinx",
... trans_id="sale_id",
... price="sale_price",
... )
Creating the Index#
Create a neural network-based index using either approach:
>>> from hpipy.extensions import NeuralNetworkIndex
>>> kwargs = {
... "prop_id": "pinx",
... "trans_id": "sale_id",
... "price": "sale_price",
... "date": "sale_date",
... "dep_var": "price",
... "ind_var": ["tot_sf", "beds", "baths"],
... "feature_dict": {
... "numerics": [],
... "log_numerics": ["tot_sf"],
... "categoricals": [],
... "ordinals": ["beds", "baths"],
... "hpi": ["sale_date"],
... },
... "preprocess_geo": False,
... "random_seed": 0,
... }
# Create index using residual approach (default).
>>> hpi_residual = NeuralNetworkIndex.create_index(
... trans_data=trans_data,
... estimator="residual", # default
... **kwargs,
... )
# Create index using attributional approach.
>>> hpi_attributional = NeuralNetworkIndex.create_index(
... trans_data=trans_data,
... estimator="attributional",
... **kwargs,
... )
Parameters#
The main parameters for neural network index creation are:
Parameters
- dep_varstr
Dependent variable to model.
- ind_varlist
Independent variables to use in the model.
- estimatorstr
Estimator type. Choose between:
“residual”: Extracts index from market pathway (default)
“attributional”: Derives index through explainability analysis
- feature_dictdict
Feature dictionary specifying how different variables should be processed:
numerics: Standard numeric features
log_numerics: Features to be log-transformed
categoricals: Categorical features for embedding
ordinals: Ordinal features
hpi: Temporal features for index generation
- num_modelsint
Number of models to train in ensemble.
- num_epochsint
Number of training epochs.
- batch_sizeint
Batch size for training.
- hidden_dimslist
List of integers specifying the number of neurons in each hidden layer.
- emb_sizeint
Embedding size for categorical features.
- dropout_ratefloat
Dropout rate for regularization (0 to 1).
- learning_ratefloat
Learning rate for optimization.
Evaluating the Index#
Evaluate the neural network index using various metrics:
>>> import altair as alt
>>> from hpipy.utils.metrics import volatility
>>> from hpipy.utils.plotting import plot_index
# Calculate metrics.
>>> vol_residual = volatility(hpi_residual)
>>> vol_attributional = volatility(hpi_attributional)
# Visualize the index.
>>> alt.layer(
... (
... plot_index(hpi_residual)
... .transform_calculate(method="'Residual'")
... .encode(color=alt.Color("method:N", title="Method"))
... ),
... (
... plot_index(hpi_attributional)
... .transform_calculate(method="'Attributional'")
... .encode(color=alt.Color("method:N", title="Method"))
... ),
... ).properties(title="Neural Network Index")
alt.LayerChart(...)