I recently had the opportunity to ask Brian Larson some questions regarding Business Intelligence. For those new to the field, Business Intelligence, or BI for short, is an rapidly expanding field within the database realm.
BI brings with it a different way of thinking! We all struggled to understand normalization and create efficient transactional data models. BI turns this thinking on its head. With BI the emphasis is placed on fast reporting. This mean pre-calculating values and denormalizing data.
BI tools can be as simple as using add-ins in Excel to as complex as stand-along database products.
I got a lot out of hearing what Brian had to say, and you share the feeling. So with out further delay, enjoy the interview.
1. When you denormalize data for BI, are there some tricks, such as incorporating “date intelligence” into your data that you typically use to make life easier?
When placing data into a denormalized data mart or data warehouse, the data should be grouped to the lowest level of granularity required for reporting. If reporting is going to be done on a daily or weekly level, then don’t save data on each individual transaction. Instead, group the data by day or by week. Saving the data at a level that is more granular than what is required for reporting just makes more work for the data model as it constantly aggregates the data to level requested by the user.
One of the common requirements for BI is to compare a set of data to a corresponding set of data from last year. While the Microsoft data modeling environments provide tools to create these type of year-over-year comparisons, these calculations can be expensive when done on the fly. It can be very beneficial to pre-calculate and store these values as part of the denormalization process.
For example, when calculating the aggregate value for sales in May, 2014, I can also calculate the aggregate value for sales in May, 2013 and store this value in another field in the same record. The previous year value can now be feed into the data model and it will not need to recalculated at the time the report is run. This pre-calculation at the time the data is denormalized will add time (perhaps a significant amount of time) to this process. However, in almost all cases the denormalization process is done by a scheduled process and no one is watching a screen waiting for it to complete; as opposed to a user who is impatiently waiting for his or her report to return results.
2. Do you have any advice for people that want to start a career as a BI developer or analyst? For instance, are certifications a must?
First, you have to have a real love for data and a desire to understand the business that generates that data. Database developers take joy in creating systems that gather and data. DBAs take their joy from maintaining a platform that is secure and piles up the nines in the up-time statistics. A BI person gets joy from presenting data to the business users in the most timely and most understandable format possible. If you can’t see yourself deriving job satisfaction in this way, then BI is not for you.
The Microsoft BI certification is a nice way to demonstrate your knowledge of the Microsoft BI tool set. Preparing to pass the certification tests forces you to cultivate a broad knowledge of the capabilities of these tools. The certification tests do require some fairly technical knowledge of a variety Microsoft products, so this can be somewhat daunting for the novice. The certification exams are easier once you have a working knowledge of some of the BI tools and can then supplement that with studies of the tools you haven’t worked with.
3. I’ve recently had a chance to work with Excel PowerPivot and PowerView. They are pretty easy to use and get started using. Are these good tools to use for BI or would you recommend another tool set that beginners can easily setup and use?
PowerPivot provides an environment to learn one of Microsoft’s two approaches to BI modeling. PowerPivot models can be simple to start with, but also support very advanced features. This allows someone new to BI to begin with the basics, and then use the model to explore increasingly complex concepts. PowerPivot models work for single-user and work group-level models. They also work for with Microsoft’s cloud-based Power BI. In addition, the concepts learned using the PowerPivot model translate directly to the Tabular model that lives on a SQL Server Analysis Services server. Indeed, PowerPivot models can be converted to Tabular models and look almost identical after the conversion is complete.
Power View can also serve as good introductory tool. However, you need to keep in mind the target audience for this tool. Power View is designed to allow a non-technical, subject matter expert to explore data that resides in a PowerPivot or Tabular model. It provides rich data visualization with only a few clicks and it does this very well. This simplicity comes at the expense of flexibility and control. If you are looking for tools to create reports and data visualizations to more exact specifications, you will need to look at Pivot Tables and Pivot Charts, as well as SQL Server Reporting Services.
4. Would you recommend that beginners learn traditional SQL and reporting before delving into BI? If so, can you explain what are some areas to pay attention to? If not, why do you feel the jump straight to BI is feasible?
I believe a solid understanding of traditional SQL is essential before delving into BI. A command of intermediate and perhaps advanced T-SQL is needed to effectively retrieve data from source systems utilizing SQL Server. Power Query and SQL Server Integration Services offer tools for manipulating data as it is moved from the source to the data mart or the data model. However, there are often time when it is most efficient to perform complex filtering, intricate joins, and grouping operations right in the T-SQL query that pulls the data from the source tables.
I suggest being comfortable with the following T-SQL concepts before delving into BI:
- INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, CROSS JOIN
- GROUP BY
- Common Table Expressions (CTE)
5. What are some good books or website resources you would recommend for beginners to review to learn more about BI?
Of course, I’m going to recommend my own books for getting started with BI. “Delivering Business Intelligence with Microsoft SQL Server 2012” provides a good overview of the Microsoft BI tool set with special focus on the server-based tools. “Visualizing Data with Power View” covers PowerPivot and the Tabular model in addition to the Power View data visualization tool. Both books are published by McGraw-Hill Professional.
In addition to these, you may want to look at “Competing on Analytics” by Thomas H. Davenport for case studies showing how to use BI to produce better business decision making.
6. What is the biggest mistake or pitfall to avoid when getting started in BI?
I think the biggest pitfall is being afraid of denormalization. Many of us have had it drilled into our heads that all denormalized data is bad. Data normalization goes a long way to making transactional data stores more efficient. It leads to databases that use disk space effectively and have higher write throughput.
In the BI world, however, we have different goals. To create an effective BI solution, we have to be willing to denormalize to get higher read throughput and more efficient execution of our SELECT statements. This can involve trading disk space for read performance.
Brian Larson is a Phi Beta Kappa graduate of Luther College in Decorah, Iowa, with degrees in physics and computer science. Brian has worked in the computer industry since 1985 with most of this time spent consulting and creating custom database applications. Currently Brian holds the position of Chief Technology Officer and Partner at Superior Consulting Services in Minneapolis, Minnesota.
Brian has several database certifications including a MCSE: Business Intelligence 2012 and a Microsoft Certified Database Administrator (MCDBA).
Brian has presented at national conferences and events such as the SQL Server Magazine Connections Conference, the PASS Community Summit, and the Microsoft Business Intelligence Conference. He is a former columnist for SQL Server Magazine.
Remember! I want to remind you all that if you have other questions you want answered, then post a comment or tweet me. I’m here to help you.