TALON vs. GLM
To be meaningful, a Talon vs. GLM comparison is best made from multiple perspectives.
Goal: GLMs are designed to produce fitted relativities, while Talon exposes weaknesses in a given rating structure
- GLMs are essentially designed to produce fitted betas, and given that the "right" predictors are included in the model, they even produce fitted values for cells with no data. While some practitioners argue that this is desirable, to the extent that the risk distribution shifts significantly between the training data and the "unseen" data, the GLM predictions on the "unseen" risks may be completely inaccurate.
- The main problem in the GLM framework is finding the "optimal" predictors to be included in the model. The approach is mainly based on inferential statistics, with very little concern given to model generalization power.
- Talon may work off an existing class plan, and produces complex compound variables that can be used later to refit the GLM and thus improve an existing GLM significantly.
- From this perspective, Talon is really a complement to the GLM, and not a competitor.
Speed: Due to their data quality requirements and model selection process, GLMs often take years to implement, while Talon provides initial results within 60 days of the data being received.
- GLMs have very strict data quality requirements - missing values and correlations between variables lead to nonsensical results or the GLMs may fail to converge altogether.
- Determining what predictors and interactions to include in the model is a tedious and time-consuming process.
Model Assumptions: GLMs are constrained by strong statistical assumptions which often are not a good fit for insurance data, while Talon has no such constraints since it uses model-free machine learning algorithms.
- GLMs make very strong, and arguably unrealistic, statistical assumptions:
- claim severity follows a gamma distribution—observation suggests that the severity distributions have a heavier tail than the gamma would imply and are often multi-modal, due to policy limits, clumping in reported amounts, etc.
- claim frequency is Poisson distributed— this assumption does not fit insurance data well, since it doesn't match the variance of real data. Poisson is a one-parameter distribution, and as such the variance is confined to follow the mean. Practitioners have recognized this, hence the attempts to use over-dispersed or zero-inflated Poissons.
- Fundamental question: Why would real insurance data follow a simple distribution like the gamma or the Poisson? Experience suggests otherwise.
- Talon uses a distribution-free approach, and its assumptions (termed Probably Approximately Correct (PAC) assumptions) are very mild compared to those underlying GLMs.
Model Limitations: GLMs use linear methods and typically only investigate a few potential compound variables. In sharp contrast, Talon uses non-linear methods and data are allowed to interact naturally letting Talon discover key interactions.
- GLMs are linear by definition, but their main limitation is that, given real-world resource and time constraints, only a very small subset of the solution space can be investigated. Further:
- The order in which data are tested is sometimes critical and
- User's intuition or prior knowledge is crucial for testing higher-order interactions, since the number of potentially important interactions is too large for a human to process.
- Talon, on the other hand
- Does not require the user to specify the predictors and interactions to be included in the model, but tries to discover them often providing "new knowledge".
- Produces 5, 6 or 7-way interactions that are almost impossible to discover within the GLM framework.
- In GLMs, all levels of two variables are interacted in a two-way interaction - leading either to a "parts explosion" if the interaction passes some statistical significance test or its removal from the model. There is no middle ground since there is no way to "partly accept" an interaction. In a very real sense, GLM required interactions are "global" in that they need to work on the whole dataset. That requirement is too strict, especially for a higher-order interactions.
- Talon is very good at discovering "local interactions", which only have predictive power on a smaller subset of the data.
Action Level: GLM analysis is performed at the coverage level, therefore interactions between coverages are largely ignored. The Talon analysis may be performed at the coverage, risk or policy level.
- The fact that a coverage is present or not on a risk (such as a vehicle or home) conveys information about the behavior of other coverages on the same risk.
- Some actuaries have recognized this – e.g. Liability-only discount in PPA.
- One problem is that the user has to guess which information is relevant and then use the GLM framework to test it. Obviously there can be no assurance that all of the potentially important interactions between coverages can be discovered this way.
The Talon analysis may start at the coverage level and all of the signal that can be derived at that level is loaded into the fitted pure premiums. Then, an additional analysis is performed at the risk level, and the signal can be implemented via risk-level discounts or surcharges.