In my line of work (property & casualty insurance), one ratio we are constantly monitoring is the loss ratio, which is defined as the ratio of the loss amounts paid to the insured to the premium earned by the insurance company.

Ignoring non claim-specific expenses (e.g. general expenses, marketing, etc.) and investment returns, a loss ratio of 100% means that for the average premium we charge the insured covers exactly the average loss amount, i.e. we break even (the premium corresponding to a loss ratio of 100% is called the *pure premium*). Note that by definition, everything else being equal, a lower loss ratio translates into higher profits for the insurance company.

Sometimes, we want to look at this loss ratio conditionally on the values of a specific variable. That is, for a particular segmenting variable (age, gender, territory, etc.), we are interested in knowing the loss ratio for each category in this variable, along with the earned premium as a percentage of the total earned premium.

For example, such a report for the *Age* variable could look like this:

age_group = c("A. [16 - 25]", "B. [25 - 39]", "C. [40 - 64]", "D. [65+ ] ") weights = c(0.25, 0.1, 0.35, 0.3) loss.ratios = c(0.6, 0.9, 0.55, 0.4) df = data.frame(age_group=age_group, weight=weights, loss.ratio=loss.ratios) df

## age_group weight loss.ratio ## 1 A. [16 - 25] 0.25 0.60 ## 2 B. [25 - 39] 0.10 0.90 ## 3 C. [40 - 64] 0.35 0.55 ## 4 D. [65+ ] 0.30 0.40

The global loss ratio for this portfolio is:

sum(df$weight * df$loss.ratio)

## [1] 0.5525

Here’s the question I’ve been asking myself lately: if I could select only one category for this variable (i.e. one age group) to try and take business measures to improve it’s profitability, which one should it be ? In other words, which of these categories has the biggest negative impact on our overall loss ratios ?

One possible answer would be to find the category which, if we improved it’s loss ratio by x%, would improve the global loss ratio the most. But if x is fixed, then this approach simply selects the category with the biggest weight…

A better solution would be to consider, for each age group, what the loss ratio of the portfolio would be if that age group was removed from consideration. For example, to calculate the impact of age group “A. [16 – 25]”, one can calculate the overall loss ratio of the porfolio consisting of ages groups B. to D., and substract that value from our orginal (entire portfolio, including group A.) loss ratio.

impacts = function(weights, loss.ratios){ overall.lr = sum(weights * loss.ratios) v = numeric() for(i in 1:length(weights)){ w.without = weights[-i]/sum(weights[-i]) lrs.without = loss.ratios[-i] lr = sum(w.without * lrs.without) v = c(v, overall.lr - lr) } paste0(round(v*100, 1), "%") } df$lr.impact = impacts(weights, loss.ratios) df

## age_group weight loss.ratio lr.impact ## 1 A. [16 - 25] 0.25 0.60 1.6% ## 2 B. [25 - 39] 0.10 0.90 3.9% ## 3 C. [40 - 64] 0.35 0.55 -0.1% ## 4 D. [65+ ] 0.30 0.40 -6.5%

What this tells us is that the age group “B. [25 – 39]” has the biggest upward impact on our overall loss ratio: if we didn’t insure this group (or equivalently, if that group’s loss ratio was equal to the loss ratio of the rest of the portfolio), our loss ratio would be 3.9 points lower.