Don’t buy a brand new Porsche 911 or Audi Q7!!

wp1

Introduction

Many people know that nasty feeling when buying a brand new car. The minute that you have left the dealer, your car has lost a substantial amount of value. Unfortunately this depreciation is inevitable, however, the amount depends heavily on the car make and model. A small analysis of data from (used) cars shows these differences.

Car Data

I have used Rvest to scrape data from www.gaspedaal.nl, a Dutch website that combines car for sales data from several other sites. The script to get the data is not that difficult, it can be found on my GitHub, together with my analysis script. There are around 435,000 cars. The data for each car consists of: make, model, price, fuel type, transmission and age. There are many different car makes and models, the most occurring cars in my data set are:

 

Car age vs. Kilometers

Obviously, there is a clear relation between the age of a car and the amount of kilometers driven. An interesting pattern to see is that this relation depends on on the car make (and model). The following figure shows a few car brands.

Large differences in amount of driving between car types start after 18 months. On average, Jaguars are not made for driving, after 60 months only around 83.000 KM are driven by its owners. While on the other hand, Mercedes-Benz owners have driven around 120.000 KM after 60 months.

A more extreme difference is between the Volvo V50 and the Hyundai i10. Between six and ten years, a Volvo V50 has driven on average 178K kilometers while a Hyundai i10 has driven only 75K kilometers.

Depreciation

A simple depreciation model is just linear depreciation. Per car brand, model, and transmission type, I can fit a straight line through price and kilometers driven. The slope of the line is the depreciation for every kilometer driven. An elegant way to obtain the depreciation per car type is by using the purrr and broom packages.

 

 

First, some outlying values are removed then only car types with enough data points are considered. Then I have grouped the data by brand, model and transmission type, so that for each group a simple linear regression model can be fitted:

Price = Intercept + depreciation * KM

The following table shows the results:

 

So, on average a new Porsche 911 costs 117,278.60 Euro, and every kilometer you drive will cost you around 49.75 cents in loss of value. The complete table with all car types can be found on RPubs. Although, simple and easy to interpret parameters, a straight line model is not a realistic model as can be seen in the following figure:

 

A better model to fit would be a non linear depreciation model. For example, exponential depreciation or if you don’t want to specify a specific function, some kind of smoothing spline. The R code only needs to be modified slightly, the code below fits a natural cubic splines per car type.

 

It is a better model (in terms of R-squared), it follows the non linear depreciation that we can see in the data. However, we do not have a single deprecation value. How much value a certain car will lose when driving 1 kilometer now depends on the amount of kilometers driven. It is the derivative of the fitted spline curve. For example, the spline curves fitted for a Renault Clio are given in the figure below. A Clio with automatic transmission hardly looses any value after 100,000 KM.

 

I have created a small shiny app so that you can see the curves of all the car types.

Conclusion

Despite my data science exercise and beautiful natural cubic smoothing splines models, buying a brand new car involves a lot of emotion. My wife wants a blue Citroen C4 Picasso, no matter what cubic spline model and R-squared I show to her!

So just ignore my analysis and buy the car that feels good to you!! Cheers, Longhow.

6 thoughts on “Don’t buy a brand new Porsche 911 or Audi Q7!!

  1. Depreciation is also time dependent. Does a 5 year old car driven only 10,000 km carry better resale over 1 year old car driven 100,000km? Also what happens to the resale value once a new model is launched?

    Like

  2. Hello Ziyad,

    That’s a valid point. In my R script you will see how easy it is to also include age in the model. So then you have a model to predict price given age AND KM.

    Your second point sounds valid too. But I have not looked at dates of new model launches, maybe there is a site where you can easily get this data.

    If you look at the scatter plot in my blog post, you see that even for one make/model combination (Renault Clio) there is a lot of variation in price. So , have ignored many factors… 🙂

    Cheers,
    Longhow

    Like

  3. Pingback: Don’t buy a brand new Porsche 911 or Audi Q7!! – Mubashir Qasim

  4. Thanks a lot for the nice post.

    True, there is definitely an effect of model generation (restyle). See, fro example, this plot

    Evidently, there are two clusters of points that, most likely, refer to different generations of the model.

    Like

  5. Pingback: Seen Elsewhere: 21st October 2016 – R for Journalists

Leave a comment