For example, if you set your sample size to five thousand rows, but your underlying data source has five million rows, the entire five-million-row data set is queried when you run your flow. Running your flow with Tableau Prep When you run your flow and generate your output for analysis, you are in Run Mode and Tableau Prep runs your flow against the entire data set. Pausing data updates allows you to quickly make changes to your flow before generating your results. When you choose to pause data updates, you are in Metadata Mode. In these scenarios, Tableau Prep allows you to pause data updates. There are times when you just need to go in an add a quick cleaning operation or input data transformations in bulk. What happens when you pause updates? Sometimes when you are building a flow in Tableau Prep, you may not need direct, live updates. Depending on where you are in the flow, and what data is necessary to deliver a performant, visual, and interactive experience, Tableau Prep may cache your Input, Union, or Join steps, among others. Tableau Prep is equipped with a special algorithm that decides which step is best to cache.
Editing a flow If you go back and edit a pre-built flow, caching will depend on which step you open and begin to explore. Your data is cached again throughout the flow when you add computationally expensive steps like Join steps or Union steps. If you choose to sample, remove columns, or change a data type, these changes will affect the result. (Tableau Prep doesn't query the source tables after the Input step when you’re in Interactive Mode.) What is cached will be based on how you configure your Input step. When you clean or reshape your data after the Input step, the changes are applied to the data in our cache. We create this cache so that you can validate your changes as they're happening without slowing down the underlying database or your machine. In the Input step, Tableau Prep queries the input tables from the source database or files, ingesting data into a Hyper extract which serves as the cache used as you continue to build your flow. Building a flow from scratch Tableau Prep will always cache the results from your Input step. Tableau Prep does not cache everything, however, and the experience differs if it is your first time building a flow from scratch, versus coming back to edit an existing flow already built by a colleague. Tableau Prep instead caches your data as Hyper extracts in our high performance, in-memory data engine. How Tableau Prep caches data When you connect to a data source in Tableau Prep, you will notice that you don’t have the option to select between a live or extract connection like in Tableau Desktop. You can use the default sample amount or build a sample set by specifying a fixed number of rows. Sampling ensures responsiveness to keep you in the flow of your task, even when you are working with large amounts of data in Tableau Prep.
At times, Tableau Prep may also sample your data. For example, if you join two tables together in a Join step, you will see the final join result-down to the number of rows-immediately. Interactive Mode delivers direct, interactive feedback as you clean, combine, and reshape your data. Building your flow with Tableau Prep When you are cleaning your data in any of the available step types (Clean, Union, Join, etc.), you are in what we call Interactive Mode in Tableau Prep. In this post, we'll get into the details of what is happening behind the scenes in Tableau Prep, so that you can navigate building and running flows with confidence. Tableau Prep is equipped with three modes that work to make your flows as optimized and performant as possible, without bogging down your machine or underlying database. The Tableau Community is made up of smart, curious, data rock stars and we often get questions like, “Is Tableau Prep making a live connection to my data?” While that’s a relatively straightforward question to answer when we’re talking about Tableau Desktop, things work a little bit differently in Tableau Prep. Tableau Prep shifts this paradigm by bringing self-service data preparation to everyone, rather than just those with specialized skills. The Harvard Business Review even found that many analysts spend 80% of their time preparing their data, spending only 20% of time for analysis.
From working with customers over the years, we hear, anecdotally, that cleaning data for analysis is a cumbersome process. In our role as solution engineers at Tableau, we get hands-on with customers every day, and get to see how companies - big and small - use Tableau Prep to clean and prepare their data.