University of Cincinnati, Carl H. Lindner College of Business, 2925 Campus Green Drive, Cincinnati, OH 45221-0130, 513-556-7174, University of Cincinnati, College of Medicine, Cincinnati Children’s Hospital Medical Center, James M. Anderson Center for Health Systems Excellence
Find articles by Craig M. FroehleMichael J. Ward, Vanderbilt University, Department of Emergency Medicine, 1313 21st Avenue, 703 Oxford House, Nashville, TN 37232-4700, 615-936-8379 (phone), 615-936-3754 (fax)
Corresponding author. The publisher's final edited version of this article is available at Bus HorizThe American healthcare system is at a crossroads, and analytics, as an organizational skill, figures to play a pivotal role in its future. As more healthcare systems capture information electronically and as they begin to collect more novel forms of data, such as human DNA, how will we leverage these resources and use them to improve human health at a manageable cost? In this article, we argue that analytics will play a fundamental role in the transformation of the American healthcare system. However, there are numerous challenges to the application and use of analytics, namely the lack of data standards, barriers to the collection of high-quality data, and a shortage of qualified personnel to conduct such analyses. There are also multiple managerial issues, such as how to get end users of electronic data to employ it consistently for improving healthcare delivery, and how to manage the public reporting and sharing of data. In this article, we explore applications of analytics in healthcare, barriers and facilitators to its widespread adoption, and how analytics can help us achieve the goals of the modern healthcare system: high-quality, responsive, affordable, and efficient care.
Keywords: healthcare, analytics, information technologyThe American healthcare system has long suffered from constrained resources, increasing demand, and questionable value, yet the future looks more promising due to increasingly sophisticated and widespread uses of data and analytics. Past performance of the healthcare system provides insight as to why change was necessary. The Centers for Medicare and Medicaid (CMS) estimate that healthcare is a staggering 17.9% of U.S. gross domestic product (GDP) and that the U.S. spent $2.7 trillion, or $8,680 per person, on healthcare in 2011 (CMS, 2013). According to the Organization for Economic Co-operation and Development (OECD), which ranks the performance of international healthcare systems, the U.S. ranked 27 th in life expectancy at birth in 2009, despite having the highest proportion of GDP spent on healthcare (OECD, 2011).
This raises the question of value of the U.S. healthcare system. There are multiple reasons for this value deficit. First, the third-party-payer system decouples the payer from the individual receiving services, mitigating some checks and balances on costs. Second, there is a lack of aligned incentives in the existing fee-for-service system, which promotes consumption of resources and overuse rather than overall patient health and well-being. Third, there are unique barriers to competition not present in other industries that prevent innovation. Fourth, for-profit insurers, fraud, and waste divert a portion of healthcare funds away from paying for care. Finally, despite information technology’s (IT) role in rapidly advancing the productivity of many other industries (e.g., Rawley and Simcoe, 2012), IT adoption in healthcare has sorely lagged behind other industries.
While small steps toward reform were attempted, the Affordable Care Act (ACA) and the Health Information Technology for Economic and Clinical Health (HITECH) Act, a component of the American Recovery and Reinvestment Act (ARRA) of 2009, have initiated tremendous change in healthcare. Fueled by the carrot-and-stick approach of the HITECH Act, hospital adoption of at least a basic electronic health record (EHR) has nearly doubled from 2008 to 2012 with 44% of U.S. hospitals using at least a basic EHR (DesRoches, 2013). Without an EHR, much healthcare data are contained in paper format. Widespread EHR adoption sets the stage for electronic data collection and subsequent analysis. The next phase is to transform these data into actionable information packets that can be used to improve the delivery of healthcare.
Now that the necessary data pieces are being put into place, analytics can, and must, play a pivotal role in the transformation of American healthcare into an efficient, value-driven system. By investing in the implementation of healthcare information technology, and by shifting the focus from quantity of treatment to overall value in healthcare, the stage is set for the application of advanced analytics. As the ACA is implemented, incentives should be more in line with patient health and well-being while achieving value for limited healthcare resources.
While healthcare has taken longer than other industries to incorporate the use of analytics, such adoption is radically transforming the delivery of healthcare for the better. In this article, we will discuss how healthcare is fundamentally changing in response to the application of analytics. We will also discuss how data are collected, organized, and analyzed, as well as the challenges facing the widespread adoption of analytics in healthcare. We will also discuss managerial issues and how analytics can produce a meaningful output for organizations and individuals alike. Finally, we will conclude with specific examples illustrating the application of analytics to healthcare delivery. We will use examples from the visualization of data in quality improvement, genetics, comparative effectiveness, chronic disease databases, disaster planning, and asset tracking to demonstrate how the application of analytics to healthcare is improving the way that healthcare is delivered and to demonstrate the unique analytical issues it raises.
There is a tendency for hospitals and healthcare systems to operate and manage a wide range of clinical and operational information systems. While the interoperability requirements of Meaningful Use (MU) (Blumenthal and Tavenner, 2010) are causing institutions to consolidate their clinical information systems into enterprise-wide EHRs, (Marsolo and Spooner, 2013) most institutions still rely on a host of platforms. Examples of such platforms are described below and summarized in Table 1 , though it is not meant to be an exhaustive list. They include:
Example data sources within a healthcare delivery system.
Data Source | Data Generated |
---|---|
Electronic Health Record (EHR) | Clinical documentation, patient history, results reporting, and patient orders. |
Laboratory Information System (LIMS) | Laboratory results (the LIMS is typically interfaced with the EHR) |
Diagnostic or monitoring instruments | Range from images (e.g., magnetic resonance imaging) to numbers (e.g., vital signs) to text report (result interpretation). May or may not be interfaced with the EHR. |
Insurance claims / billing | Information on what was done to the patient during a visit, the cost of those services and the expected payment. The level of service is often determined from data in the EHR. |
Pharmacy | Information on the fulfillment of medication orders. Not typically part of the EHR. |
Human resources and supply chain | Lists of employees and their roles in the institution; location and utilization of medical supplies. Not typically interfaced with the EHR. |
Real-time locating systems | Positions and interactions of assets and people |
EHRs have become one of the largest sources of digital information on the health and well-being of patients. Spurred in part by the ARRA and MU, the rate of EHR adoption has grown dramatically (DesRoches, 2013; HealthIT Dashboard, 2013). EHRs are used to capture family, social, surgical, and medical history, allergies and immunizations, laboratory results, clinical findings, clinical orders, and other condition-specific information. Depending on the configuration of the EHR, this information may either exist in discrete fields or be captured as part of free-text notes (Marsolo and Spooner, 2013).
a LIMS is used when processing laboratory samples, storing the interim and final results for a particular test. It typically contains sample metadata (collection date/time, container type, preservative, etc.) that are useful for quality assurance purposes.
These range from magnetic resonance imaging (MRI) or computed tomography (CT) scanners to echo- and electrocardiograms and vital sign monitors. The level of integration with these instruments will vary based on importance and the sophistication of the underlying system. Some instruments will simply generate a text report that is transmitted to the EHR. Others may produce images or other raw data that can be used for analytical purposes, or, like radiology picture archiving and communication systems (PACS), improve the management of these imaging databases.
these systems are used to generate bills for the services during each clinic or hospital visit and keep track of what was paid by patients, insurance providers, and other payers.
until now, pharmacy information systems had rather limited uses involving inventory management. Pharmacy information systems are becoming increasingly sophisticated to address clinical problems such as medication non-adherence, a major reason for a lack of improvement in patient outcomes (Martin, 2005). In an outpatient setting, it is possible to determine whether an order was placed for a particular medication, but it is more challenging to determine whether the patient actually took the medication as prescribed. Pharmacy refills are being supplanted by electronic pill bottles as a better way of determining medication adherence (Aardex Group, 2012).
many healthcare systems now use typical enterprise-level IT systems (e.g., Peoplesoft or SAP) to manage their human resources (HR) and supply chains. These are typically not connected to the other systems mentioned previously in this section.
increasingly, hospitals and large healthcare organizations are investing in systems that provide the real-time location of assets (e.g., intravenous pumps) and/or people (e.g., staff and patients) in order to better manage operations (Froehle and Magazine 2013). These systems locate the asset or person through some combination of wireless technologies, such as RFID, Wi-Fi, ultrasound, infrared, and GPS. Combined with management front-ends, these technologies can reduce loss and theft of assets and improve the situational awareness of staff who direct workflow.
Most clinical information systems were not designed with analytics in mind and as such, do not necessarily make it easy to “get the data out.” Systems typically support data transmission using Health Level Seven (HL7) messages, (Health Level Seven International, 2013) but only a fraction of the total information in the system may be accessible using such an interface (Garrido 2013). Systems may also provide a back-end reporting database that can be used for research and analytics, but there may be a lag in how often that data is refreshed. Access to real-time data can be problematic. Typically, the only potential avenues available are (a) HL7, which limits the data that can be accessed and the types of questions that can be asked; and (b) web services, which provide a richer interface, but the details of which may be considered the intellectual property of the vendor and, therefore, may only be made available to its customers.
While there is no shortage of data standards in the healthcare industry, there is a distinct lack of uptake of those standards by the health IT community. Within a given clinical information system, vendors are free to define their own data structures, and often do. The same element may be stored and coded in myriad ways by vendors, and sometimes even within different systems from the same vendor. In one classic example, it was reported that there were over 40 different ways of capturing blood pressure within a single EHR (Koppel, 2013). All of them were valid within the clinical context in which the measurements were taken. The only standards that do exist are those that are tied to payment. That is why ICD-9 is used to code billing diagnoses (CMS, 2013), as they are required by CMS to receive payment from Medicare and Medicaid. That is also why the healthcare industry is preparing to move to ICD-10 in October 2014 (CMS, 2013).
Despite the challenges detailed above, efforts are underway to facilitate the sharing and exchange of data through standardization of data formats. The primary drivers behind this are the MU regulations. They call for clinical findings to be coded to SNOMED-CT (International Health Technology Standards Development Organisation, 2013), laboratory results to Logical Observation Identifiers Names and Codes or LOINC (LOINC, 2013), and medication orders to RxNorm (National Library of Medicine, 2013). Within the research community, such mappings are standard when performing inter-institutional analyses; making them standard will significantly reduce the burden of sharing data in a coherent fashion.
In addition to calling for the use of standard terminologies, another key element of MU is to increase the interoperability and exchange of EHR data (Blumenthal and Tavenner, 2010). In Phase 2, there are explicit measures to enable patients to view, download, and transmit (VDT) their results. Modeled after the Blue Button initiative (http://www.healthit.gov/bluebutton), which was started by several federal agencies to allow patients to view and download their personal health data, this is an attempt to put patients more in control of their health and their health data. Health systems will also be required to demonstrate the ability to exchange patient records with other health systems in their region, moving closer to the vision of patients having a single record that contains all of their health data. This means that it will become much easier to perform population-level analytics on the “standard” data elements that can be exchanged via these mechanisms (e.g., allergies, medication orders, surgical history, vital signs, and diagnoses).
The application of analytics in healthcare requires the transformation of data into usable information that can be relayed back to end-users. The adoption of EHRs and other electronic data mechanisms makes the application of analytical tools more tractable by providing the basic electronic data upon which to act. This coincides with the rise of the “data scientist,” a term sometimes applied to those who use analytics and can serve as a one-stop shop for data management, analysis, and interpretation of electronic data. In healthcare, this is particularly important for translating electronic bits into meaningful data.
These data scientists often need to draw from a dizzyingly broad spectrum of analytical methodologies. Well-established techniques, such as biostatistics and epidemiologic analysis, Monte Carlo and discrete-event simulation, and causal modeling are being joined by methods previously uncommon in healthcare. These newer methods include data mining, Bayesian statistics, optimization modeling, social network analysis, and agent-based simulation, just to name a few.
Analysis is dependent upon the context in which it is being performed. Clinical care and performance improvement can require very different data perspectives and use the data in unique ways. Clinical analytics involves improving the care of patients. This type of data is very different than process-oriented data and may include genetic data as well as clinical records, which are often narrative and may be more difficult to analyze on a large scale. Performance data, on the other hand, may be subject to the issues described above, namely availability and quality. Considering that EHRs were not designed with system performance in mind, figuring out how to capture these data with high quality at a low cost is a daunting, yet fundamentally important, task.
Traditionally, healthcare has used business data far less regularly and comprehensively than most other industries. It has underinvested in advanced managerial technologies like reporting systems and data visualization. This may be partly due to some healthcare providers viewing investments in managerial and operational information systems as less important than investments in clinical information systems.
Whereas many organizations outside healthcare have developed or purchased real-time reporting systems that push targeted updates to specific end users, healthcare has typically relied on centralized production of static, undifferentiated report documents that provide the same view of historical performance to all recipients. Contemporary reporting systems often incorporate features such as interactive dashboards that provide customized, up-to-the-minute (or at least frequently updated) graphical displays of critical performance metrics, historical trends, and reference benchmarks or goals. These dashboards are designed to help the end user focus on those data that are most informative about how their systems are performing. In healthcare, decision-support dashboards are increasingly common on the clinical side, especially in EHR environments, but far less so when it comes to supporting managerial or operational decisions ( Figure 1 ).
A sample dashboard of emergency department performance measures. Used with permission from Emergency Medicine Business Intelligence.
One of the distinguishing features of many contemporary dashboards and analytical systems is the use of sophisticated visualization techniques. A growing amount of research has demonstrated that users make better decisions, or at least are more confident in the decisions they make, when data are provided to them in graphs or tables that are easy to interpret and understand (e.g., Tait et al., 2010). Advanced visualization techniques can provide more consistent, clean, and unambiguous charts that can improve the speed and reliability of users’ decision-making. Combining these advanced visualization methods with real-time dashboards can put unprecedented power in the hands of end users to better understand how key metrics are changing and what should be done to address problems.
A key objective for any analytics system in healthcare is to produce a valuable output for those taking care of patients, doing research, or making other decisions about how the organization functions. For example, one of the core motivations for the recent massive investment in EHR infrastructure is the assumption that these systems will provide data to enable clinicians and researchers to develop better interventions, protocols, drugs, and policies that lead to improved patient outcomes.
In healthcare, the foremost concern for management is the people that compose it – the key stakeholders – be they patients, physicians, nurses, and other medical staff, referring providers, or representatives from the local community. Empowering these individuals and increasing the quality and transparency of decision-making are key goals for any business analytics initiative. Therefore, because of the pervasive influence these systems have, the organization needs to establish business analytics as an organizational and cultural objective, a component of its long-term strategy.
Such a culture would result in some fundamental improvements to the organization. First, decision-making based primarily on data and information would become the expectation and the norm. This is essential to complete the transformation to evidence-based medicine. Additionally, because data are shared and updated frequently, routine decisions can be more easily automated, or augmented with decision-support systems. Tools like CPOE systems that verify and validate medication orders in real time are but one example of the power and promise of analytics.
However, to realize these benefits, clinicians, support staff, and leadership all need to understand and appreciate the importance of business analytics as tools and as a fundamental process within the organization. Otherwise, the organization will continue to underinvest and staff will be skeptical of the value of recording data as a matter of course.
Another key managerial challenge is finding and retaining the personnel capable of performing these often complex analytical and data management tasks. A key issue that has limited the expansion of healthcare analytics has been a lack of qualified individuals with the appropriate background and skills in computing and mathematics to perform these analyses combined with the increasing demand for such individuals. McKinsey and Co. estimate that by 2018, the U.S. could face a shortage of 140,000 to 190,000 people with the appropriate analytical skills (McKinsey & Co., 2013).
Additional managerial issues involve the measurement of outcomes from the use of electronic data and their subsequent application to human health. Do they improve health? Do they save money? What metrics should be used to quantify their benefit or cost? Currently, there are no clear-cut answers (Manachemi, 2011, Mandl, 2012). To answer these questions, analytics will be necessary to determine the cost of the benefits achieved from such technology.
While it is a widely held belief that clinical information systems, and EHRs in particular, can serve as a rich source of data for analytical and research purposes, not all data are created equal. Some data elements are captured in a more consistent fashion and on a greater percentage of the patient population. Most EHRs are designed to allow the same piece of information to be captured in many different ways. A diagnosis, for instance, could be listed on the patient’s problem list, in their medical history, billing records, reason for visit, clinical narrative, etc. Institutions can implement best practices on where certain information is supposed to be documented on the patient’s chart and use quality improvement (QI) reports to ensure compliance (see below).
In most cases, however, when conducting population-level analytics, one needs to remember to look in all possible locations where data may exist. Otherwise, the user is left with the population who has data in the locations being searched. The latter approach introduces its own set of biases, but, in some cases, it may be sufficient (for example, ensuring that all patients with diabetes on their problem list are identified so they can have their Hemoglobin A1c levels checked versus trying to identify every patient in the hospital who might possibly be suspected of having diabetes). As a result, there is a growing awareness about the role of data quality in EHR-based analytics and a need to characterize the data’s “fitness for use” before utilizing it for any ancillary purposes (Weiskopf and Weng, 2013).
Another major challenge facing the use of analytics includes the availability and the cost of acquiring electronic data. A project to automate metric reporting at the integrated health system, Kaiser Permanente, found that the necessary electronic data were frequently not available for public reporting of system metrics (Garrido, 2013). This resulted in nearly $7 million in administrative costs to obtain and report these data. After implementing automated data reporting, Kaiser estimated that they reduced abstraction time by over 50% and saved approximately $1 million in administrative costs. However, just because data are collected does not mean that they are readily able to be mined. Some data may be in a clinical narrative that is more difficult to mine, requiring natural language processing algorithms to determine whether a particular action (e.g., smoking cessation) was performed. While discrete elements, such as check boxes, could be added to the user interface, this approach threatens to prolong the time a provider uses electronic tools (e.g., Poissant, 2005). Even when discrete fields are available, much electronic data are often incomplete (Staroselsky, 2006).
The standardization of data also raises another challenge:the accuracy of electronic data. EHRs may not improve, and may even worsen, data quality (Tse, 2011). For example, in one emergency department, EHR implementation increased the number of systematic errors during implementation compared to the legacy system (Ward, 2013). Compromised data quality poses risks for interpretation as well as any actions resulting from such data.
Data quality and the process of data collection are inextricably linked. Once data quality is compromised, it can be tremendously expensive to overcome; therefore it is critical to focus on high-quality data collection (Redman, HBR 2013). In healthcare, the generation of high-quality, useful data does not necessarily happen as a byproduct of the system. In the vast majority of cases, to produce high-quality data, someone needs to collect it. Therefore, workflows must be designed in a way that assures the important data elements will be captured during a visit and that these tasks minimally disrupt workflow, particularly expensive resources such as nurses and physicians. Even if this interaction is as trivial as a keypress, information processing theory tells us that burden can greatly undermine the consistency and quality of the data being collected (Payne, 1993).
Instead of collecting as much data as possible, institutions should actually take the opposite approach, ensuring that they collect on the minimal set of data elements that are required. It is far better to have a smaller set of high-quality elements with a high completion percentage instead of a large set with spotty coverage. There are ways that organizations can encourage employees to collect specific data elements, including publicizing the capture rates of individual employees within a clinic (anonymous or identified), tying a portion of salary to data entry compliance, and providing a tangible benefit from the collection of the data (e.g., the data that are captured can be used to automate a downstream process, saving time and effort).
There is only a certain amount of data that can be collected in any single visit. After a certain point, the data entry will increase the visit length to the point that it affects patient flow, potentially impacting patient satisfaction and revenues. As a result, another approach taken by an increasing number of organizations is to have patients take on a larger amount of the data entry burden. By providing kiosks or tablets to allow them to fill out forms in the waiting room or allowing them to enter the data at home through a patient portal, physicians simply need to review the responses instead of keying them in themselves (whether these patient-reported data are as complete or as of high quality as data provided by clinicians is an open question).
From an analytical perspective, this approach is limited by the quality of the data supplied by patients and is subject to recall bias. As one physician mentioned to us, “For example, when I ask a patient if they have any medical problems, I have had multiple patients respond ‘no,’ only to later see that they have human immunodeficiency virus (HIV) in their chart.” Another challenge posed by patient-entered data is that these responses are typically segregated in the EHR’s reporting database from those entered by clinicians. Even with a large percentage of patients entering data, clinicians will still need the ability to enter the same data elements through their EHR interfaces. Therefore, when using these data for analytical purposes, one must remember to merge both the patient-entered data tables with the clinician-entered ones in order to get a comprehensive dataset.
One of the most significant ongoing debates about analytics in healthcare involves the public reporting of results. The trend has been towards more transparency. CMS, for example, now provides a public report card of hospital quality measures (Hospital Compare, 2013) and there are numerous private initiatives to spur public reporting on quality and cost (James, 2012). There are competitive concerns about sharing quality and outcome data, so many institutions are reluctant to share data that are not required by federal or state regulations.
An innovative approach to sharing health data has been through the establishment of collaborative, multi-center, quality-improvement networks. These networks, such as Solutions for Patient Safety (Ohio Children’s Hospitals, 2013) and ImproveCareNow (Crandall et al., 2011), set goals such as eliminating patient harm and improving the care and outcomes of children with Inflammatory Bowel Disease (IBD). Another example of inter-institutional data sharing, the Emergency Department (ED) Benchmarking Alliance (http://www.edbenchmarking.org/) which allows member EDs to review and compare blinded operational data of similar type facilities. They define a set of outcome measures and standardize the collection of data. Center-level outcomes are shared within the collaborative and participants learn from those centers with the best outcomes. Metrics can be shared with the public while keeping the healthcare facility anonymous. Because many of the centers participating in the network are in competition with one another, the networks are largely built on trust and a sense of duty to the public good. By ensuring that no center’s results are used against them in a disparaging way, they are able to improve outcomes for the population as a whole.
With the increased access and use of electronic data, privacy concerns are also increasing. Specifically, results that involve genetic data also require special consideration. The federal government has had a long-standing policy of making public as much as possible the genetic findings that were obtained with public research funding (NIH, 2013). That has led to the establishment of public databases like dbGaP (Mailman, 2007, Zhang, 2008) and international initiatives like the 1000 Genomes Project (Genomes Project, 2010). By nature, genetic information cannot be de-identified, but it was widely believed that the size and complexity of the data would at least confer some degree of anonymity. Recent studies have shown that such expectations are not valid (Gymrek, 2013), and, due to the concern of using genetic information to discriminate for employment or insurance purposes (despite it being illegal) (Pulley, 2012), there may be a move to decrease transparency when it comes to sharing genetic results.
Healthcare faces another managerial challenge that is not present in other industries, in that the use of data for research is governed by different rules than it is for non-research purposes. Federal regulations like the Health Insurance Portability and Accountability Act (HIPAA) and the Common Rule govern what data can be used for research, who may have access to that data, and the type of patient consent that may be required before access is granted. This poses challenges to analytical staff and their IT systems. In many cases, an organization may want to have a common set of business rules that are applied regardless of whether the data are used for clinical care, internal performance improvement, or research (as in the public reporting example above). If the logic for the business rules are encoded into the analytical system, the organization will either need to determine how to assign role-level access on that data to control who can see the data for research purposes or implement a completely stand-alone research infrastructure, which poses its own set of costs and challenges.
As we have discussed, analytics have begun to improve and inform healthcare in many and varied ways. Some in particular have seen dramatic adoption or seem to present a potentially revolutionary approach to medical decision-making and the management of healthcare. These examples, and others, can be loosely grouped into two categories – discovery/efficacy and care delivery – as shown in Figure 2 . We discuss several of these examples below.