Integration of Tableau with Python (TabPy) – Part 2

Part 2 – Model Creation in Tableau


If you have successfully connected Tableau with the TabPy server, you can proceed with this post. If you haven’t done it yet please follow the Part 1 of this blog. In this post, I will explain how to perform clustering in Tableau. Here we will be using Tableau’s power of visualization at the front end and Python’s power of processing complex calculations at the back end. For this demonstration, I am using the Sample – Superstore data set provided by Tableau. If you want to follow along please download the Tableau packaged workbook here.

You can either follow the steps mentioned in this blog or follow the video below:


1. Create a Scatter Plot in Tableau:


In a new worksheet, drag the Profit measure to the columns shelf and Sales measure to the rows shelf as shown below.



You will only get a single point in the visualization. To convert it into a scatter plot, open the Analysis menu, and uncheck the Aggregate Measures option. You will get a visualization as below.




2. Create Calculated Fields:


Now we will create a calculation that will run in Python. For this, we will create a calculated field in Tableau. Go to the Analysis menu and click Create Calculated Field.



Paste the code as below and name the field Clustering.

Code:

SCRIPT_INT("
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
X = np.column_stack([_arg1, _arg2])
X = StandardScaler().fit_transform(X)
db = DBSCAN(eps=1, min_samples=3).fit(X)
return db.labels_.tolist()
",
SUM([Profit]), SUM([Sales])
)


This calculation will give output starting from -1. To start your cluster from 0, create a new calculated field with the code below:



Since we will have discrete cluster numbers, right-click the Cluster Numbers pill in the Data area and click on Convert to Discrete.



If you follow all the steps correctly, you should have two calculated fields as below:



3. Let’s Get Some Clusters:

Drag the calculated field Cluster Number to Color Marks Shelf as below:



You will get the final result will clusters as below.



Congratulations! You have completed creating a model by integrating Tableau and Python. However, if you want to perform clustering repeatedly across several workbooks, will you go through all this hassle every time? To know how to publish and query a model in TabPy, continue to Part 3 of this blog.


Leave a comment