Interactive sunbuRst graphs in Power BI in 5 minutes!!

pbiviztool

Introduction

If I mention Power BI to fellow data scientists I often get strange looks. However, I quite like the tool, it is an easy and fast way to share results, KPI’s and graphs with others. With the latest release, Power BI now supports interactive R graphs, and they are easy to create as well.

Steps to follow

1. Install Node.JS from here and then you can install the power bi tools with:

>npm install -g powerbi-visuals-tools

2. Create a new custom R visual:

>pbiviz new sunburstRHTMLVisual -t rhtml

3. This will create a directory sunburstRHTMLVisual. In that directory, edit the R script file script.r. It’s a one-liner to create a sunburst graph with the sunburstR package.

 

Values is the name of the input data frame, the data that is received from within the Power BI desktop application.

4. Now create the custom R visual as package with the following command: (issue this command inside the directory sunburstRHTMLVisual)

>pbiviz package

5. Inside the sub folder dist you will now find the file sunburstRHTMLVisual.pbiviz, this can be used in Power BI. Open the Power BI desktop application and import a custom visual from file. Select the sunburstRHTMLVisual.pbiviz file

 

That’s it, you’re done!

The resulting graph in a dashboard

To use the visual you will need a data set in power bi with two columns, one with the sequences and one with the number of occurrences of the corresponding sequence.

 

Click on the icon of the custom R visual you’ve just imported and select the two columns to get the interactive sunburst graph. Once the graph is created, you can hover over the rings to get more info, and you can turn on/off the legend.

 

Cheers, Longhow.

Did you say SQL Server? Yes I did….

rsqlserver

Introduction

My last blog post in 2016 on SQL Server 2016….. Some years ago, I have heard predictions from ‘experts‘ that within a few years Hadoop / Spark systems would take over traditional RDBMS’s like SQL Server. I don’t think that has happened (yet). Moreover, what some people don’t realize is that at least half of the world still depends on good old SQL Server. If tomorrow all the Transact stored procedures would somehow magically fail to run anymore, I think our society as we know it would collapse…..

postapo

OK, I might be exaggerating a little bit. The point is, there are still a lot of companies and use cases out there that are running SQL Server without the need for something else. And now with the integrated R services in SQL Server 2016 that might not be necessary at all 🙂

Deploying Predictive models created in R

From a business standpoint, creating a good predictive model and spending time on this, is only useful if you can deploy such a model in a system where the business can make use of the predictions in their ‘day-to-day operations’. Otherwise creating a predictive model is just an academic exercise / experiment….

Many predictive models are created in R on a ‘stand-alone’ laptop /server. There are different ways to deploy such models. Among others:

  • Re-build the scoring logic ‘by hand’ in the operational system. I did this in the past, it can be a little bit cumbersome and it’s not what you really want to do. If you do not have much data prep steps and your model is a logistic regression or a single tree, this is doable 🙂
  • Make use of PMML scoring. The idea is to create a model (in R) transform that to pmml and import the pmml in the operational system where you need the predictions. Unfortunately, not all models are supported and not all systems support importing (the latest version of) PMML
  • Create API’s (automatically) with technology like for example Azure ML, DeployR, sense.io or openCPU, so that the application that needs the prediction can call the API.

SQL Server 2016 R services

If your company is running SQL Server (2016) there is an other nice alternative to deploy R models by using the SQL Server R services. At my work at RTL Nederland [Oh btw we are looking for data engineers and data scientists :-)] we are using this technology to deploy the predictive churn and response models created in R. The process is not difficult; the few steps that are needed are demonstrated below.

Create any model in R

I am using an extreme gradient boosting algorithm to fit a classification model on the titanic data set. Instead of calling xgboost directly I am using the mlr package to train the model. Mlr provides a unified interface to machine learning in R, it takes care of some of the frequently used steps in creating a predictive model regardless of the underlying machine learning algorithm. So your code can become very compact and uniform.

xgboostexample

Push the (xgboost) predictive model to SQL Server

Once you are satisfied with the predictive model (on your R laptop), you need to bring that model over to SQL Server so that you can use it there. This consists of the following steps:

SQL Code in SQL Server, write a stored procedure in SQL server that can accept a predictive R model, some meta data and saves that into a table in SQL Server.

sqlr_sp

This stored procedure can then be called from your R session.

Bring the model from R to SQL, to make it a little bit easier you can write a small helper function.

rhelper

So what is the result? In SQL Server I now have a table (dbo.R_Models) with predictive models. My xgboost model to predict the survival on the Titanic is now added as an extra row. Such a table becomes like a sort of model store in SQL server.

sqlmodels

Apply the predictive model in SQL Server.

Now that we have a model we can use it to calculate model scores on data in SQL Server. With the new R services in SQL Server 2016 there is a function called sp_exec_external_script. In this function you can call R to calculate model scores.

sqlserver_rmodel_call

The scores (and the inputs) are stored added in a table.

sqltabel

The code is very generic, instead of xgboost models it works for any model. The scoring can (and should be) be done inside a stored procedure so that scoring can be done at regular intervals or triggered by certain events.

Conclusion

Deploying predictive models (that are created in R) in SQL Server has become easy with the new SQL R services. It does not require new technology or specialized data engineers. If your company is already making use of SQL Server then integrated R services are definitely something to look at if you want to deploy predictive models!

Some more examples with code can be found on the Microsoft GitHub pages.

Cheers, Longhow