Simple Math Riddle

Here’s a deceptively simple riddle for you:

Using the numbers 1, 3, 4, 6 exactly once each, and any combination of +, -, *, /, plus parenthesis if you need to, try to arrive at the number: 24.

Give it a try. If you find the solution, I applaud you! (I could’t).

If you don’t (or if like to read recursive algorithms), you can find a solution in R here, where I use the superb data.tree package to do a brute-force random search.


Montreal: breaking & entering

I finally signed the lease for my future apartment. Now it’s time to see whether my new neighborhood is safe or not !


Check out the code to create the map above here.

The 4 red neighborhoods are:

  • 23: Hochelaga-Maisonneuve
  • 26: Côte-des-Neiges
  • 38: Le Plateau-Mont-Royal partie sud
  • 44: Rosemont et La Petite-Patrie

Oh and in case you’re wondering…my neighborhood is “Very low risk” 🙂

TSX Stock Scraper

Whenever I form the unfortunate idea to invest in the stock market, I first start by playing with the google finance stock screener.

This time, instead of burning my money, i thought my time would be better spent building a stock screener of my own.

The step-by-step code is here:


The result is data frame that looks like this:


For an example of visualization:


In the above plot, I’ve removed outliers and displayed some stock symbols in the north-west corner, i.e. stocks with low Price/Earnings ratio and high BookValue/Price ratio.

U.S. Trade in Goods


After hearing Donald Trump repeat ad nauseam that “We [the U.S.] are losing in trade to China, losing to Mexico…”, I decide to check for myself !



  • U.S. exports to China have been steadily increasing for the last 15 years, but the imports from China have sky-rocketed during the same period.
  • Trade balance with Mexico has actually been pretty stable for the last 10 years.
  • Among the top-8 commercial partners, Canada is the 1st importer of U.S. goods. Mexico is second, China is third.
  • Exchanges with Canada have dropped sharply in 2015.





Loss ratio by variable

In my line of work (property & casualty insurance), one ratio we are constantly monitoring is the loss ratio, which is defined as the ratio of the loss amounts paid to the insured to the premium earned by the insurance company.

Ignoring non claim-specific expenses (e.g. general expenses, marketing, etc.) and investment returns, a loss ratio of 100% means that for the average premium we charge the insured covers exactly the average loss amount, i.e. we break even (the premium corresponding to a loss ratio of 100% is called the pure premium). Note that by definition, everything else being equal, a lower loss ratio translates into higher profits for the insurance company.

Sometimes, we want to look at this loss ratio conditionally on the values of a specific variable. That is, for a particular segmenting variable (age, gender, territory, etc.), we are interested in knowing the loss ratio for each category in this variable, along with the earned premium as a percentage of the total earned premium.

For example, such a report for the Age variable could look like this:

age_group = c("A. [16 - 25]", "B. [25 - 39]", "C. [40 - 64]", "D. [65+ ]   ")
weights = c(0.25, 0.1, 0.35, 0.3)
loss.ratios = c(0.6, 0.9, 0.55, 0.4)

df = data.frame(age_group=age_group, weight=weights, loss.ratio=loss.ratios)
##      age_group weight loss.ratio
## 1 A. [16 - 25]   0.25       0.60
## 2 B. [25 - 39]   0.10       0.90
## 3 C. [40 - 64]   0.35       0.55
## 4 D. [65+ ]      0.30       0.40

The global loss ratio for this portfolio is:

sum(df$weight * df$loss.ratio)
## [1] 0.5525

Here’s the question I’ve been asking myself lately: if I could select only one category for this variable (i.e. one age group) to try and take business measures to improve it’s profitability, which one should it be ? In other words, which of these categories has the biggest negative impact on our overall loss ratios ?

One possible answer would be to find the category which, if we improved it’s loss ratio by x%, would improve the global loss ratio the most. But if x is fixed, then this approach simply selects the category with the biggest weight…

A better solution would be to consider, for each age group, what the loss ratio of the portfolio would be if that age group was removed from consideration. For example, to calculate the impact of age group “A. [16 – 25]”, one can calculate the overall loss ratio of the porfolio consisting of ages groups B. to D., and substract that value from our orginal (entire portfolio, including group A.) loss ratio.

impacts = function(weights, loss.ratios){

  overall.lr = sum(weights * loss.ratios)

  v = numeric()

  for(i in 1:length(weights)){
    w.without = weights[-i]/sum(weights[-i])
    lrs.without = loss.ratios[-i]
    lr = sum(w.without * lrs.without)
    v = c(v, overall.lr - lr)

  paste0(round(v*100, 1), "%")

df$lr.impact = impacts(weights, loss.ratios)
##      age_group weight loss.ratio lr.impact
## 1 A. [16 - 25]   0.25       0.60      1.6%
## 2 B. [25 - 39]   0.10       0.90      3.9%
## 3 C. [40 - 64]   0.35       0.55     -0.1%
## 4 D. [65+ ]      0.30       0.40     -6.5%

What this tells us is that the age group “B. [25 – 39]” has the biggest upward impact on our overall loss ratio: if we didn’t insure this group (or equivalently, if that group’s loss ratio was equal to the loss ratio of the rest of the portfolio), our loss ratio would be 3.9 points lower.

Why smart people are loners

I was thinking about an article (in french) a friend of mine sent me. Basically the article is saying that the correlation between happiness and social-life depends on your IQ: the smarter you are, the less your happiness is correlated to how much time you spend with friends.

At the same time, somewhere on Quora, someone had suggested that people who are significantly smarter than average are less likely to meet like-minded people.

This led me to wonder about the average distance between two points inside a disk (notice the smooth transition). I’m sure this question has been asked before (see here for example), and I’m sure one could integrate their way to the answer, but hey I’m not about to pass an opportunity to rdoodle!

(Before looking at the result below, try to guess what the average distance, as a function of the radius, should look like – linear, polynomial..? To be honest I had no clue)

N = 1000

distances = function(rmax){

  r = runif(N, 0 ,rmax)
  theta = runif(N, 0, 2*pi)
  x = r * cos(theta)
  y = r * sin(theta)

  distances = sqrt(outer(x, x, function(x1, x2) (x1-x2)^2) + outer(y, y, function(y1, y2) (y1 - y2)^2))

  #plot(x, y, xlim=c(-rmax, rmax), ylim=c(-rmax, rmax), col="blue", pch=19, cex=0.75)


r = seq(1,10,0.25)

means = sapply(r, distances)

plot(r, means, col="blue", type="l")


Looks like the relationship is linear.
Let’s get the value of the coefficient:

#linear regression with 0 intercept
lm(means ~ -1 + r)

#lm(formula = means ~ -1 + r)

#     r

There you go. Average distance = 0.7223r

Which is the right answer to the wrong question, because what I really wanted to compute is the average distance between a person on the edge of the disk (i.e. a smart person) and all the others – how about now, still linear ?

The adjustment to the code is actually very small (love you R!):

N = 1000

distances = function(rmax){

  r = runif(N, 0 ,rmax)
  theta = runif(N, 0, 2*pi)
  x = r * cos(theta)
  y = r * sin(theta)

  #the exact position of the person on the edge doesnt matter due to symmetry
  x0 = 0
  y0 = rmax

  distances = sqrt(outer(x0, x, function(x1, x2) (x1-x2)^2) + outer(y0, y, function(y1, y2) (y1 - y2)^2))


r = seq(1,10,0.25)

means = sapply(r, distances)

lm(means ~ -1 + r)

#lm(formula = means ~ -1 + r)

#    r

Average distance = 1.087
See? I told you smart people were less gregarious…