Tableau Database Tips and Tableau Best Practices
Tableau is an awesome database querying and visualization tool – however, like any tool, we want to have a good grasp of how it’s handling data for us behind the scenes; and in how it’s generating SQL and interacting with a database in particular. So here’s some tips and tricks for when you’re hitting a database with Tableau.
- On the data source screen, where you design how your universe of tables is joined, there’s a drop down attributed called “Assume Referential Integrity” under the Data menu – you’re going to want to toggle this on.
If this is unchecked, that means Tableau is going to join every table in your universe; even if it neither selects nor restricts on anything in these other tables. It will simply introduce massive amounts of time and inefficiency as it joins tables that your query doesn’t actually care about. Referential integrity should be insured by your data model design and the ETL to load the tables; I’m not sure under which scenarios I would actually have this marked off, and am a bit surprised that it’s the default option.
- Tableau is tempting, and a bit naughty, around dates. A best practice in a universe would have very strict control over dates. Using something like a mart and a time dimension, the ideal scenario would be using different fields for different cuts of time; ie. a day field and a separate year field, with the appropriate field and literal value to restrict on made for each. However once Tableau sees a day, it’s going to want to be helpful and nice for you, and produce any kind of date rollup you could ever wish for off that date. The problem arises when you send a restrict statement to the warehouse, and it looks something along the lines of:
where extract(year from datefield) = 2017which is going to be much slower than
where year = 2017for a number of reasons. The Tableau code is going to result in a full table scan; which is at least tolerable if you’ve got this field on a time dimension; but lawd help you if it’s plucked off a natural date from a fact – you’re going to be scanning far more records than are necessary in that case. Avoid exposing those on a fact to Tableau as they will be ripe for abuse and full table scans.
- Tableau Desktop uses the temp directory specified in %TEMP%, and Tableau Serer uses C:\ProgramData\Tableau\Tableau Server\data\tabsvc\temp – I’m not sure if you can change either within Tableau to a hardcoded place. You can manipulate %TEMP% on your local system however; My Computer -> Properties -> Advanced system settings -> Advanced tab -> Environment Variables – here you could specify a faster area of disk if you have one available, ie. SSD vs. spinning disk or something of the like. Note your other temp files will be placed here by other processes as well potentially while this is set.
- Are you extracting a large amount of data out at certain intervals to support your dashboard and visualizations? Try to schedule it outside of business hours if this is the case! Use incremental extracts/refreshes whenever you can, rather than full extracts each time data is needed.
- That being said, Tableau performs best using extracts which are contiguously placed on disk.
So if you’re using spinning disk rather than SSD technology, it’s good to get in an occasional full refresh as well. Work out a schedule that seems appropriate for the volume and burden of the extract.
- Tableau advises granting create temp table privileges to its users.
I am unclear as to when and in which situations this actually applies. I turned this on for a power Tableau Desktop user of mine, as well as the application id for a Tableau Server account. Fishing through my database’s query history, I did see that in the past, Tableau had attempted to create temp tables on connect to see if it had that ability. After my changes, those same SQLs completed successfully; however that’s all I actually saw in the database, that the test passed. There were no situations in which it actually used the privilege. I’d like to better understand this area, although it seems it may not apply very often, or at least not in the case of how my users are using Tableau.
I would also highly recommend checking out this book by Stephen Few, who has some fantastic books on dashboard design and visual presentation of data, with many examples of what not to do and what to do instead. My own dashboards became much better once I read his book, and it’s great reference to check back to as well.