## Introduction

Many people know that **nasty feeling** when buying a brand new car. The minute that you have left the dealer, your car has lost a substantial amount of value. Unfortunately this depreciation is inevitable, however, the amount depends heavily on the car make and model. A small analysis of data from (used) cars shows these differences.

## Car Data

I have used Rvest to scrape data from www.gaspedaal.nl, a Dutch website that combines car for sales data from several other sites. The script to get the data is not that difficult, it can be found on my GitHub, together with my analysis script. There are around **435,000 **cars. The data for each car consists of: make, model, price, fuel type, transmission and age. There are many different car makes and models, the most occurring cars in my data set are:

**Car age vs. Kilometers**

Obviously, there is a clear relation between the age of a car and the amount of kilometers driven. An interesting pattern to see is that this relation depends on on the car make (and model). The following figure shows a few car brands.

Large differences in amount of driving between car types start after 18 months. On average, **Jaguars **are not made for driving, after 60 months only around **83.000 KM** are driven by its owners. While on the other hand, **Mercedes-Benz **owners have driven around **120.000 KM** after 60 months.

A more extreme difference is between the Volvo V50 and the Hyundai i10. Between six and ten years, a Volvo V50 has driven on average 178K kilometers while a Hyundai i10 has driven only 75K kilometers.

## Depreciation

A simple depreciation model is just linear depreciation. Per car brand, model, and transmission type, I can fit a straight line through price and kilometers driven. The slope of the line is the depreciation for every kilometer driven. An elegant way to obtain the depreciation per car type is by using the **purrr **and **broom **packages.

First, some outlying values are removed then only car types with *enough *data points are considered. Then I have grouped the data by brand, model and transmission type, so that for each group a simple linear regression model can be fitted:

Price = *Intercept* + *depreciation* * KM

The following table shows the results:

So, on average a new Porsche 911 costs **117,278.60 Euro**, and every kilometer you drive will cost you around **49.75 cents** in loss of value. The complete table with all car types can be found on RPubs. Although, simple and easy to interpret parameters, a straight line model is not a realistic model as can be seen in the following figure:

A better model to fit would be a non linear depreciation model. For example, exponential depreciation or if you don’t want to specify a specific function, some kind of **smoothing spline. **The R code only needs to be modified slightly, the code below fits a natural cubic splines per car type.

It is a better model (in terms of R-squared), it follows the non linear depreciation that we can see in the data. However, we do not have a single deprecation value. How much value a certain car will lose when driving 1 kilometer now depends on the amount of kilometers driven. It is the derivative of the fitted spline curve. For example, the spline curves fitted for a Renault Clio are given in the figure below. A Clio with automatic transmission hardly looses any value after 100,000 KM.

I have created a small shiny app so that you can see the curves of all the car types.

## Conclusion

Despite my data science exercise and beautiful natural cubic smoothing splines models, buying a brand new car involves a lot of **emotion**. My wife wants a blue Citroen C4 Picasso, no matter what cubic spline model and R-squared I show to her!

So just ignore my analysis and **buy the car that feels good to you!!** Cheers, Longhow.

Depreciation is also time dependent. Does a 5 year old car driven only 10,000 km carry better resale over 1 year old car driven 100,000km? Also what happens to the resale value once a new model is launched?

LikeLike

Hello Ziyad,

That’s a valid point. In my R script you will see how easy it is to also include age in the model. So then you have a model to predict price given age AND KM.

Your second point sounds valid too. But I have not looked at dates of new model launches, maybe there is a site where you can easily get this data.

If you look at the scatter plot in my blog post, you see that even for one make/model combination (Renault Clio) there is a lot of variation in price. So , have ignored many factors… 🙂

Cheers,

Longhow

LikeLike

Pingback: Don’t buy a brand new Porsche 911 or Audi Q7!! – Mubashir Qasim

Thanks a lot for the nice post.

True, there is definitely an effect of model generation (restyle). See, fro example, this plot

Evidently, there are two clusters of points that, most likely, refer to different generations of the model.

LikeLike

Pingback: Seen Elsewhere: 21st October 2016 – R for Journalists

I ran your scraping code but got this error message

Error in paste0(baseURL, i) : object ‘baseURL’ not found

Any suggestions?

Duncan

LikeLike