Planes, boats and automobiles: a discussion of machine learning with telematics data

Ali Soliman

Data plays a central role in all technical aspects of insurance and actuarial work. However, utilisation is often still confined to aggregate premium and claims data. Not so in the case of telematics. Say the phrase ‘black box’ and most people will think of flight recorders fitted to aircraft. But Motor insurers also use the millions of data points generated by black boxes, fitted to more than a million cars in the UK, to price risks. What’s more Marine insurers are getting in on the act. In this post we take an actuarial vantage to explore the use of telematics data and consider whether insurers could be using this ‘gold mine’ of information even more widely.  

A new insurance era

The advances in mathematical algorithms and computational power have enabled new applications that are transforming the way the insurance industry operates – in particular the applications of machine learning (ML) and data Science (DS). One area in particular is the application of ML into insurance risk pricing where these techniques are used to gain more insight into expected policyholder behaviour. Insurance companies rely on these insights to drive the assumptions for the pricing process.

One recent application that has transformed insurance practices was the introduction of telematics into Motor insurance. At its core, a telematics system includes a tracking device installed in a vehicle that allows the sending, receiving and storing of telemetry data. It connects via the vehicle’s own on-board diagnostics (OBDII) or CAN-BUS port with a SIM card, and an on-board modem to enable data transfer through a wireless network. 

The data collected from the telematics system has provided insurers with more insight into the level of risk exposure. This, in turn, has enabled them to calculate more accurate prices that appropriately reflect the risks and, hence, supports their profitability — see Figure 1.

Traditional pricing techniques use information such as driving experience, vehicle rating and vehicle use. Telematics uses acceleration, braking and cornering g-forces, amongst others. Not only is the latter a better predictor of risk but it’s captured in real-time, allowing insurers to adjust premiums during the policy term. Meaning good drivers save money and insurers mitigate risks.

Figure 1: Telematics data flow into pricing

Change is coming…

The use of telematics data has been a big step forward for the insurance industry, but so far its use has been largely confined to motor insurance pricing. The industry could potentially extend the application into other areas. 

Recently, a few marine insurers started using telematics data in pricing. Applying ML techniques to the analysis of dynamic data obtained from telematics devices has provided deeper analytical insights on accident severity and frequency for cargo ships. These insights have enabled Marine insurers to analyse and price risks more accurately. 

Beyond extending telematics to other classes of insurance business, there’s the potential to use it more widely within existing classes. Currently insurers make little use of this data in their reserving or capital setting. 

Reserves (or technical provisions) are funds set aside to pay expected future claims and their level directly impacts the profit reported by an insurer. These are typically set using traditional actuarial techniques like the chain ladder (CL) or Bornhuetter-Ferguson (B-F) methods. In the case of the latter, the initial expected loss ratio (the prior estimate), will often be informed by pricing data. However, there’s a case for augmenting these methods further with techniques that use telematics data. A simple example of this is the use of telematics data from an accident to create a ‘crash storyline’, which could be used to set initial claim estimates.

Furthermore, black box users are likely to have different expected claims profile — both in terms of frequency and severity — than the insurer has experienced in their historical claims. This is particularly true where users get feedback and are able to modify their driving behaviour, and thus their risk profile. Hence, insurers could be holding inaccurate reserves that don’t represent the expected future claim experience.

Likewise capital, the cushion insurers hold against unexpected events, is also typically set using aggregate data. But again these figures could be refined and potentially made more accurate.

…but it’s not all plain sailing

The biggest challenge for marine insurers is that the claims experience, from the business written on the telematics pricing, has not accumulated in a large enough volume to adequately feed the model calibration process. This is important, as it is expected that the use of telematics in the marine sector will influence behaviour and affect risk, as it has in the motor sector. However, ML algorithms could potentially help to overcome this by using the insights gained from dynamic data — provided by telematics — to replace the conventional historical data set with a synthetic data set that is more reflective of the expected claim profile.

ML algorithms could be used to produce the synthetic data set. One approach is to apply a two stage solution where, first, a supervised ML algorithm is used to perform a data mining exercise and a classification exercise on the telematics data. The data mining exercise would be combined with a mapping between the conventional historical data and the newly emerged telematics data. Examples of the ML algorithms that could be used are the C4.5 algorithm and the support vector machine. These algorithms would give insights into the claim distribution profile of both old and telematics data, which would enable us to deduce an adjustment factor that can be applied to the historical claim loss settlement data to produce the synthetic data set that will be used in the chain ladder model to calculate the reserve.

Building on synthetic data, we could advance to stage two where an unsupervised machine learning algorithm is used to perform simulations for the future expected claim pattern — i.e. simulated future loss ratios based on the synthetic data. The simulation would be performed using a clustering ML algorithm that sorts data into various groups, groups based on attributes, and finds different groups within the elements in the data. The algorithm will be able to infer that there are, say, two different classes without knowing anything else from the data. One of the main clustering algorithms that we can use is the K-means. The clustering algorithm will enhance the data analytics by segmenting datasets by some shared attributes, detecting anomalies that do not fit to any group, and also simplify datasets by aggregating variables with similar attributes — as well as highlighting any intrinsic and hidden structure of the data.

Once the future pattern of the projected claims is complete, the final step would be to feed the revised projections into the model calibration. Hence, generating a capital estimate based on the claim profile from the Telematics business. 

The proposal to use DS and ML applications in the capital model’s calibration for Marine insurance was presented recently as part of a case study for the April 2020 Certificate of Data Science programme intake — which was sponsored by the Institute & Faculty of Actuaries and Southampton Institute of Data Science. The case study was presented mainly to initiate a discussion within the insurance industry to highlight a potential ML and DS application that could be beneficial to marine insurers. It looked at applying ML algorithms into telematics data that could potentially produce a synthetic data set that replaces the historical data set used in the capital model’s calibration.  

Of the other challenges, gaining buy-in is maybe the most onerous. Whilst there is no formal requirement to approve individual premium rates, there are requirements for actuaries to ensure reserves and regulatory capital are appropriate. As such those involved in all stages of the process — internal and external reviewing actuaries plus regulators, including the PRA — will need to be convinced these new datasets and new methods are sufficiently robust to ‘hang their hat’ on them.  Furthermore, whilst premiums need to be set at a policy level, reserves and capital are typically set at an aggregated (sub) class level. So telematics data could become a valuable tool in the armoury of the capital actuary.

Additionally insurers have, to date, relied on third party companies to collect, store and at least partially analyse these data, which brings GDPR into the equation. And due to the operational intensity of dealing with such large volumes of data, it’s expected that insurers will continue to use them for the foreseeable future. 

Back to the future

Clearly there are hurdles to overcome, in order to glean the full benefits of telematics data, but these are not insurmountable. Volumes of data will grow and ML can help to fill the gap in the meantime. Methods will be developed and will evolve. Actuaries and data scientists will get used to them, as they have done with CL and B-F. The first step, though, is to have a discussion around the use of DS & ML techniques in insurance. So, let the debate begin.   

Aly Soliman works in the Bank’s Life Insurance Groups area.

If you want to get in touch, please email us at or leave a comment below.

Comments will only appear once approved by a moderator, and are only published where a full name is supplied. Bank Underground is a blog for Bank of England staff to share views that challenge — or support — prevailing policy orthodoxies. The views expressed here are those of the authors, and are not necessarily those of the Bank of England, or its policy committees.