What is the difference between REX, the Data Warehouse, and the Cube? I have been asked this question several times so I have written a brief description of each in hopes that when you go to your next meeting and someone brings up their report of fancy numbers, it may be easier for you to understand where that data came from, because chances are, they will say one of the three things I mentioned above if asked (and they may not even know what they are!).
Let’s first start with REX. If you were able to attend the information sharing work group presentation that Bridget and Lindsey did on REX, you saw very good demo on what REX is and what it’s used for. REX stands for Report Exchange and it is a web site devoted to university reporting. It should be the portal you go to when you want to start seeing reports on everything from enrollment numbers to how many tickets you put into the helpdesk ticketing system last year. I won’t go into the various sections of REX here, but you can feel free to take a look around, there are several helpful links that explain what REX is, how to get access to the various reports on it and how to use it. The website is https://rex.umbc.edu . All of the reports on the REX website are created from data that is stored in something called the Data Warehouse which is our next topic.
The Data Warehouse is a storage place for data across campus and is comprised of data from several systems. It contains housing and event data from a system called Symplicity, most of the data that is in PeopleSoft, and links to other database systems across campus including our own in house Contracting System. Each night the data is pulled from the various database systems across campus and dumped into this data warehouse. This is great because all of the data that is used to make decisions across campus can be found in one place. The downside is that since this data is pulled in nightly (generally starts at 12 midnight and finishes around 8:30 am) the reports contains data that is yesterday’s data. For most decision making regarding strategic planning efforts, this is acceptable. Another system of data that is stored in the data warehouse is something called the data cube. This is the third topic.
The "Cube" is a subset of data and logic that sits in the data warehouse that makes reporting on trends and aggregate data much easier/quicker. It takes data from various locations and performs business logic on them and then stores those results in the warehouse. For example, if you wanted to know what percentage of students were out of state, the cube would calculate all that up for you and give you a simple answer rather than you having to do the calculations on your report. The main difference between a regular database and a cube database is that the regular database are usually designed with an IT system in mind, and the function is to just provide data to and from the system where as the cube is designed using business logic for advanced analytics and they are typically used to allow analysts to perform queries on millions of records at a time.
So in short, REX is a storage place for reports and reporting tools used to access data that people are using across campus, the data warehouse is where all the data that the reports in REX use is stored on a nightly basis, and the cube is a subset of data that is stored in tabled based on business logic.
Think of it as the postal service. The warehouse is the mail box. All of the mail is just pushed together in that mail box and sits there waiting for someone to do something with it. Then the mail person comes and takes it to a mail sorting facility. This is the cube. The facility sorts the mail based on the zip code and sorts it into different boxes. Once that is done, the mail person comes back, grabs that data and hands it over to the recipient which is REX. Of course we all know that it isn’t that simple and that the mail person an ETL (which is short for extract, transform and load process), but that is a topic for another article.