Introduction to Pharmacoeconomics. Retrospective Database Analysis
1.1 Introduction to Pharmacoeconomics
1.2 Analytical Perspectives
1.3 Code of Ethics
1.4 Overview of Economic Evaluation Methods
1.5 Quality of Life and Patient Preferences
1.6 Decision Analysis and Modeling
1.7 Ranking Priorities: Developing a Formulary List
1.8 Incremental Analysis and Quadrants
1.9 Fourth Hurdle and Drug Approvals
1.10 From Board Room to Bedside
2 Outcome of use Pharmacoeconomics
5.1 Retrospective Database Analysis
5.2 Claims and Medication Databases
5.2.1 Description of Claims Database Files
5.3 Electronic Medical Records and Medical Charts
5.3.1 Medical Chart/Medical Record in General
5.3.2 Electronic Medical Records (EMRs) or Charts
5.3.2.1 Advantages/Disadvantages
5.3.2.2 Current Use of EMRs
5.4 Patient Reported Outcomes
5.4.1 Use of PRO Instruments in Pharmacoeconomic Studies: Focus on HPV Vaccine Studies
5.5 Alternative Population-Based Data Sources
5.6 Issues and Challenges
5.7 Statistical Issues
5.8 Non-U.S. Countries
5.9 The Future in Use of Retrospective Databases
1.1 Introduction to Pharmacoeconomics
Practitioners, patients, and health agencies face a multitude of conundrums as the development of new therapies seems boundless, while the money to purchase these cures is limited. How does one decide which are the best medicines to use within restricted budgets? The continuing impact of cost‑containment is causing administrators and policy makers in all health fields to examine closely the costs and benefits of both proposed and existing interventions. It is increasingly obvious that purchasers and public agencies are demanding that health treatments be evaluated in terms of clinical and humanistic outcomes against the costs incurred.
Pharmacoeconomics is the field of study that evaluates the behavior or welfare of individuals, firms, and markets relevant to the use of pharmaceutical products, services, and programs. The focus is frequently on the cost (inputs) and consequences (outcomes) of that use. Of necessity, it addresses the clinical, economic, and humanistic aspect of health care interventions (often diagrammed as the ECHO Model, Figure 1.1) in the prevention, diagnosis, treatment, and management of disease. Pharmacoeconomics is a collection of descriptive and analytic techniques for evaluating pharmaceutical interventions, spanning individual patients to the health care system as a whole. Pharmacoeconomic techniques include cost-minimization, costeffectiveness, cost-utility, cost-benefit, cost of illness, cost-consequence, and any other economic analytic technique that provides valuable information to health care decision makers for the allocation of scarce resources. Pharmacoeconomics is often referred to as “health economics” or “health outcomes research,” especially when it includes comparison with non-pharmaceutical therapy or preventive strategies such as surgical interventions, medical devices, or screening techniques.

Figure 1.1 ECHO Model. (Kozma, CM et al. Economic, clinical, and humanistic outcomes: A planning model for pharmacoeconomic research. Clin Ther. 15: (1993): 1121–32.)
Pharmacoeconomic tools are vitally important in analyzing the potential value for individual patients and the public. These methods supplement the traditional marketplace value as measured by the prices that the patient or patron is willing to pay. With government agencies and third parties’ continuing concern about the higher expenditures for prescriptions, pharmaceutical manufacturers and pharmacy managers are highly cognizant that pharmaceutical interventions and services require comparative cost-justification and continual surveillance to assure costeffective outcomes.
From pharmaceutical research, we have seen significant therapeutic advances and breakthroughs. From health care delivery entrepreneurs we have seeumerous expanding roles for pharmacists, nurses, and physician assistants, with services such as home intravenous therapy, drug-level monitoring, parenteral nutrition management, hospice care, self-care counseling, and genetic screening for customizing therapy, among other innovations. The use of valid economic evaluation methods to measure the value and impact of new interventions can increase acceptance and appropriate use of such programs by third‑party payers, government agencies, and consumers.
There is increasing scrutiny over all aspects of health care as we attempt to balance limited finances and resources against optimal outcomes. Cost-effectiveness evaluations of pharmaceutical options are becoming mandatory for attaining adequate reimbursement and payment for services.10,11 Pharmacoeconomic methods help document the costs and benefits of therapies and pharmaceutical services, and establish priorities for those options to help in appropriately allocating resources in ever-changing health care landscapes.
1.2 Analytical Perspectives
Point of view is a vital consideration in pharmacoeconomics. If a medicine is providing a positive benefit in relation to cost in terms of value to society as a whole, the service may not be valued in the same way by separate segments of society. For example, a drug therapy that reduces the number of admissions or patient days in an acute care institution is positive from society’s point of view but not necessarily from that of the institution’s administrator, who depends on a high number of patient admissions to meet expenses. Thus, one must determine whose interests are being served when identifying outcome criteria for evaluation. When considering pharmacoeconomic perspectives, one must always consider who pays the costs and who receives the benefits. A favorable economic analysis that showed savings in clinic utilization from the employer perspective would probably not be viewed positively from the clinic’s budget perspective. More broadly, what is viewed as saving money for society may be viewed differently by private third‑party payers, administrators, health providers, governmental agencies, or even the individual patient. It is generally agreed among health economists that the societal perspective should always be discussed in an evaluative report, even though the focus of the report might deal with other segments such as hospitals or insurance agencies. In the United States, with many different health care delivery and payer approaches, this can be complicated, and analyses are often done from multiple perspectives to assist adjudication by multiple stakeholders.
1.3 Code of Ethics
The International Society for Pharmacoeconomics and Outcomes Research (ISPOR) has published a code of ethics that is vital to the honesty and transparency of the discipline. The code encourages pharmacoeconomists to maintain the highest ethical standards because the academy recognizes that activities of its members affect a number of constituencies. These include but are not limited to:
1) patients who are ultimately going to experience the greatest impact of the research;
2) practitioners who will be treating or not treating patients with therapies, medications, and procedures made available or not made available because of the research;
3) governments, employers, decision-makers, and payers who must decide what is covered so as to optimize the health of the patient and resource utilization;
4) professional outcomes researchers;
5) colleagues, where relationships in conducting research and related activities are particularly critical;
6) research employees concerned about how they are regarded, compensated, and treated by the researchers for whom they work;
7) students who work for researchers, where respect and lack of exploitation are important because they are the future of the discipline;
8) clients for whom the research is conducted, and the researchers’ relationships with them.
The ISPOR code of ethics lists many standards for researchers, but a sample section of the code related to “design and research practices” is as follows:
1. Maintain a current knowledge of research practices.
2. Adhere to the standards of practice for their respective fields of research and identify any official guidelines/standards used.
3. Research designs should be defined a priori, reported transparently, defended relative to alternatives, and planned to minimize all types of bias.
4. Respect the rights of research subjects in designing and conducting studies.
5. Respect the reputations and rights of colleagues when engaged in collaborative projects.
6. Maintain and protect the integrity of the data used in their studies.
7. Not draw conclusions beyond those which their data would support.
1.4 Overview of Economic Evaluation Methods
This section will introduce the reader with a brief overview of the methodologies based on the two core pharmacoeconomic approaches, namely cost‑effectiveness analysis (CEA) and cost-utility analysis (CUA). Table 1.1 provides a basic comparison of these methods with cost-of-illness, cost-minimization, and cost-benefit analysis. One can differentiate between the various approaches according to the units used to measure the inputs and outcomes, as shown in the table. In general, the outputs in CEA are related to various natural units of measure, such as lives saved, life‑years added, disability‑days prevented, blood pressure, lipid level, and so on. Cost-benefit analysis (CBA) uses monetary values (e.g., euros, dollars, pounds, yen) to measure both inputs and outputs of the respective interventions. Further discussion and examples of these techniques have been presented elsewhere. It is hoped that the evaluation mechanisms delineated further in this book will be helpful in managing pharmaceutical interventions toward improving societal value and generate greater acceptance by health authorities, administrators, and the public. Using the human papillomavirus (HPV) vaccine as an example for case studies, other chapters in this book will further illustrate the various analytical methodologies related to CEA, CUA, CBA, etc.

1.5 Quality of Lif e and Patient Preferences
Significant components in pharmacoeconomics are patient outcomes and quality of life (QoL) with an expanding list of related factors to consider (Table 1.2). Although it is recognized that there are physical, mental, and social impairments associated with disease, there is not always consensus on how to accurately measure many of these factors. Consequently, the concept of satisfaction with care is often overlooked in costeffectiveness studies and even during the approval process of the U.S. Food and Drug Administration (FDA). Generally, pharmacoeconomic and outcomes researchers consider QoL a vital factor in creating a full model of survival and service improvement. QoL is related to clinical outcomes as much as drugs, practitioners, settings, and types of disease. The question becomes how to select and utilize the most appropriate instruments for measuring QoL and satisfaction with care in a meaningful way.
Table 1.2
Outcomes and Quality of Life Measurement Approaches
I. Basic Outcomes List –- Six D’s
A. Death
B. Disease
C. Disability
D. Discomfort
E. Dissatisfaction
F. Dollars (Euros, Pounds, Yen)
II. Major Quality of Life Domains
A. Physical status and functional abilities
B. Psychological status and well-being
C. Social interactions
D. Economic status and factors
III. Expanded Outcomes List
A. Clinical End Points
1. Symptoms and Signs
2. Laboratory Values
3. Death
B. General Well-being
1. Pain/Discomfort
2. Energy/Fatigue
3. Health Perceptions
4. Opportunity (future)
5. Life Satisfaction
C. Satisfaction with Care/Providers
1. Access
2. Convenience
3. Financial Coverage
4. Quality
5. General
The quality-adjusted life year (QALY) has become a major concept in pharmacoeconomics. It is a measure of health improvement used in CUA, which combines mortality and QoL gains and considers the outcome of a treatment measured as the number of years of life saved, adjusted for quality.
One approach to conceptualizing QoL and outcomes data collected in clinical trials is to consider the source of the data. There are several potential sources of data to evaluate the safety and efficacy of a new drug. Potential sources and examples are listed below:
Patient-reported • outcomes (PROs)—e.g., global impression, functional status, health-related QoL (HRQoL), symptoms
• Caregiver-reported outcomes—e.g., dependency, functional status
• Clinician-reported outcomes—e.g., global impressions, observations, tests of function
• Physiological outcomes—e.g., pulmonary function, blood glucose, tumor size.
1.6 Decision Analysis and Modeling
Decision analysis is defined as “… a systematic approach to decision making under conditions of uncertainty.” Decision analysis is an approach that is explicit, quantitative, and prescriptive.
It is explicit in that it forces the decision maker to separate the logical structure into its component parts so that they can be analyzed individually, then recombined systematically to suggest a decision. It is quantitative in that the decision maker is compelled to be precise about values placed on outcomes. Finally, it is prescriptive in that it aids in deciding what a person should do under a given set of circumstances. The basic steps in decision analysis include identifying and bounding the decision problem; structuring the decision problem over time; characterizing the informatioeeded to fill in the structure, and then choosing the preferred course of action.
Pharmacoeconomic models can involve decision trees, spreadsheets, Markov analyses, discrete event simulation, basic forecasting, and many other approaches.
In a simplified form, a decision tree can double as an educational tool for presenting available therapeutic options and probable consequences to patients and decision makers. Wennberg and others have explored ways to involve patients in a shared decision-making process. One of his projects involved a computer interactive program on prostate surgery education. The program explains to patients the probability of success, the degree of pain that might be encountered at each step, and what the procedure actually entails. After viewing this program with visual graphic depictions of the surgery, many of the patients changed their decisions about wanting surgery rather than watchful waiting. This reduction in a major procedure resulted from a greater focus on QoL and patient satisfaction. With further evaluation and perhaps modification of the computer program, it should also produce more cost-effective care. Wennberg’s work is an application of outcomes research that helped to weigh costs, utilities, and QoL for the patient.
1.7 Ranking Priorities: Developi ng a Formulary List

Table 1.3 illustrates how cost–utility ratios can be used to rank alternative therapies as one might do for a drug formulary. The numbers in the second column of the table list the total QALYs for all of a decision maker’s patient population that is expected to benefit from the treatment options in each row. The numbers in the third column detail the total cost of treatment for all of one’s targeted patient population for each treatment option in each row. For the next step in the selection process, rank the therapy options by their cost–utility ratios. Options have already been ranked appropriately in this table. For the final selection step, add each therapy option into one’s formulary, moving down each row until your allocated budget (using the cost column) is exhausted. In other words, if you have only $420,000, you would be able to fund therapies A, B, and C. These options have the best cost-utility for one’s population given one’s available budget. Cost-effectiveness and cost–utility ratios are sometimes presented in similar fashion and are called League Tables. Tengs et al. have published an extensive list of interventions and Neumann and colleagues maintain a website with a substantial list of cost–utility ratios based on health economic studies, with a sample in Table 1.4.

These listings must be used with caution because there are a number of criticisms of rankings with league tables, including:
• Different reports use different methods
• What the comparators were (e.g., which drugs, which surgeries)
• Difficult to be flexible about future comparators
• Orphan and rare disease versus more prevalent diseases
• Randomized prospective trials versus retrospective studies
• Regional and international differences in clinical resource use
• Regional and international differences in direct and indirect costs of treatment
• Statistical confidence intervals of cost and outcomes results
• Difficult to test statistical significance between the pharmacoeconomic ratios of treatments listed
1.8 Incremental Analysis and Quadrants
Whether one is dealing with cost analyses or decision analysis, it is important to
properly compare one treatment with another, and one should understand the concepts in incremental analysis. Incremental analysis does not mean that one is adding a second therapy to the patient’s regimen, but it is a technique for comparing one therapy with another. The basic incremental formulas are as follows:
CEA: (Cost1– Cost2 ) / (Effectiveness1 – Effectiveness2)
or
CUA: (Cost1– Cost2 ) / (QALYs1 – QALYs2)
An interesting way of displaying this information is illustrated in Figure 1.2.

By displaying this information in quadrants, one can more easily visualize the relationship between therapies. Drugs that are cheaper and more effective would fall in the “accept” or “dominant” sector, while drugs that are more expensive and less effective would be “dominated.” The slopes of the lines represent the incremental cost–effectiveness ratios and, in general, therapies between $20,000 to $100,000 per life year saved (or per QALY) are often considered acceptable in public policy reports.
A classic paper involving incremental analysis deals with the comparison of tissue plasminogen activator (TPA) to streptokinase. In this study, the important question did not involve looking at the CEA ratio of each drug individually; instead, it analyzed the incremental differences of the new drug, TPA, over the standard therapy at the time. The analysis demonstrated that TPA, when compared with streptokinase, had an incremental cost per life year saved of about $40,000, which was considered a socially acceptable value.
1.9 Fourth Hurdle and Drug Approvals
The classic basic elements required for approval of new drugs are:
1) therapeutic efficacy,
2) drug safety,
3) product quality.
But more recently, with the realization of limited national and global financial resources, another drug approval step has been added that considers factors related to pricing and reimbursement. Therefore, in at least two dozen countries, there is an additional jump before the marketing of pharmaceuticals that is often called “the fourth hurdle.” This criterion, usually involving cost-effectiveness and pharmacoeconomic analyses, is required even when efficacy, safety, and quality have been demonstrated. Such a fourth hurdle was initially introduced in Austria for the reimbursement of new drugs. Despite the extra development costs to conduct these studies, and concern from the pharmaceutical industry, this fourth step can also be viewed as a positive opportunity to better support more innovative medicines over me-too drugs. Pharmacoeconomic analyses can provide quantitative evidence for more rational new drug approvals. And with postmarketing surveillance and patient registries, pharmacoeconomics should be able to help sustain cost-effective drug utilization throughout the life cycle of the therapy.
1.10 From Board Room to Bedside


Figure 1.3 provides a basic consult form that suggests a framework for pharmacoeconomic assessments. If a decision between alternative treatments needs to be made, this form could help structure the calculations and considerations related to pharmacoeconomics. With the current technology and resources in most facilities, at an individual patient level, certainly, it would be impossible to have sufficient time with each patient to individually apply detailed calculations. Evolving e-health technologies and the Internet may facilitate patient applications in the future. This consult worksheet is a basic template, then, for evaluating therapeutic options for a drug formulary, framing a formal pharmacoeconomic study. In an ideal pharmacoeconomic world, it could be used for a basic calculation sheet to be discussed with a physician or patient and maintained in a patient’s medical record.
Although a pharmacoeconomic analysis of a new treatment may indicate that the intervention is cost-effective versus existing therapy, the continued clinical success of the new treatment is paramount. The least cost-effective drug, from an individual patient perspective, is the drug that does not work. Substantially more research remains to be performed not only on future drugs in the pipeline but also on existing interventions in the marketplace so that we can maximize patient outcomes and enhance cost-effectiveness. Computer technology and the Internet are tremendous resources for disseminating and applying pharmacoeconomic techniques, and then continually documenting outcomes for practitioners and patients. It is expected that reimbursement plans will include more incentives (paying for performance) for improvements in these economic, clinical, and humanistic outcomes. Thus, pharmacoeconomics reaches from the societal (macro) and board room level out to the clinical and patient (micro) level, as envisioned in Figure 1.4.

![]()
Even health practitioners will be increasingly expected to allocate scarce resources based on pharmacoeconomic principles. Using pharmacoeconomics and disease management concepts, health providers can produce more cost-effective outcomes in a number of ways. For example:
• Decrease drug–drug and drug–lab interactions.
• Increase the percentage of patients in therapeutic control.
• Reduce the overall costs of the treatment by utilizing more efficient modes of therapy.
• Reduce the unnecessary use of emergency rooms and medical facilities.
• Reduce the rate of hospitalization attributable to or affected by the improper use of drugs.
• Contribute to better use of health manpower by utilizing automation, telemedicine, and technicians.
• Decrease the incidence and intensity of iatrogenic disease, such as adverse drug reactions.
By improved monitoring and assessment of drug therapy outcomes, practitioners can provide early detection of therapy failure and provide cost-effective prescribing.
In this chapter, a general introduction to pharmacoeconomics has been provided. There are many reports in the literature that demonstrate that the benefit of medicines is worth the cost to the payer(s) for numerous disease states. Still, it must be realized that even though most research is positive, there is a need to continue to develop interventions and services that maximize the benefit‑to‑cost ratio to society. Even though new drugs can demonstrate positive ratios of benefit to cost, society or agencies will ultimately invest their resources in programs that have the higher benefit‑to‑cost or the best cost–utility ratio. Similarly, the health system must be convinced that any new therapy is worth utilizing, with a resultant modification or even deletion of other, less effective, therapeutic options, if necessary. All sectors of society, and certainly the pharmaceutical arena, must fully understand pharmacoeconomics if everyone around the globe is to have optimal health care and a better future.
2 Outcome of use Pharmacoeconomics
The key aim of all economic analyses is to make the best choice within defined parameters. Pharmacoeconomics is the branch of economics related to the most economical and efficient use of pharmaceuticals; economic approaches are applied to pharmaceuticals to guide the use of limited resources to yield maximum value to patients, health care payers and society in general.
Cost-effectiveness studies are of utmost importance to justify expenditure in all fields of health care (fig. 2).

Figure. 2 When are pharmacoeconomic analyses needed?
The main components that must be considered in any pharmacoeconomic evaluation are (fig. 2):
• perspective (health trust, governmental body, insurance company, patients, society in general);
• time horizon;
• cost (direct medical costs, direct nonmedical costs, indirect costs, intangible costs);
• outcome (years of life saved, years of disease-free survival, cure rate).
It is generally considered that there are four principal outcomes of pharmacoeconomic studies:
1. Lower cost, better outcome
2. Higher cost, better outcome
3. Lower cost, poorer outcome
4. Higher cost, poorer outcome
It is of course conceivable that a given drug could cost more or less than its competitor while resulting in the same outcome. Clearly, the first outcome is the most favourable for the use of the new treatment. Conversely, the last outcome does not favour the use of the new drug.
When the second or third outcome arises, the choice of treatment is up to the prescribing doctor, prescribing policy and the budget available. In such situations, it is also necessary to look at other differences between agents that may sway the decision to prescribe one drug rather than another. These differences may include pharmacokinetic and pharmacodynamic traits, the risk of drug interactions and compliance rates.
5.1 Retrospective Database Analysis
Retrospective databases, whether created de novo from pre-existing sources, such as patients’ written charts, or from preexisting electronic datasets, such as medical and pharmacy claims databases, electronic medical records, national insurance administrative data, hospital medical records, disease-specific patient registries, and patients and provider survey data, are a rich source of data for pharmacoeconomic analyses. A listing of some population-based data sources (Table 5.1) and data sources available commercially or from the U.S. government (Table 5.2) is provided. In addition to health economic analyses, the data collected from these datasets can be used for outcomes research (such as analysis of healthcare practice patterns, epidemiologic analysis of disease progression, prevalence and characteristics of patient populations), evaluation of populations for prediction of future events, for formulary evaluation and to supplement prospective datasets, among other uses. When evidence is not available for a decision that is imminent, analyses utilizing retrospective databases can provide decision support that is real-time, relevant, and comprehensive, providing that precautions are taken to address statistical considerations that may be inherent in these data sources. Indeed, several studies have found that treatment effects in observational studies were neither quantitatively nor qualitatively different from those obtained in “well-designed” randomized, controlled trials (RCTs). Advantages of retrospective analyses in comparison to, for example, RCTs, include the fact that they are relatively inexpensive, quickly done, reflective of different populations, encompass a realistic time frame, organizationally specific, can be used for benchmarking purposes, include large sample sizes, and can capture real-world prescribing patterns.




5.2 Claims and Medication Databases
Health care administrative claims data, generally developed and maintained by third-party payers, offer a convenient and unique approach to studying health care resource utilization and associated cost. These databases represent a convenient alternative because data already are collected and stored electronically by health insurance companies. Claims data include outpatient, inpatient, and emergency room services, along with cost of outpatient prescription drugs. Computerized health insurance claims databases are maintained largely for billing and administrative purposes. Unlike studies with primary data collection, claims data are not collected to meet specific research objectives. Nevertheless, these databases are useful for describing health care utilization, patterns of care, disease prevalence, drug and disease outcomes, medication adherence, and cost of care. Administrative claims data are thus an important source of information about major processes of care.
Administrative claims databases tend to be highly representative of a large, defined population. Large sample sizes permit enhanced precision and are particularly useful for studying rare events. As the data already are collected and computerized, data analysis is inexpensive, particularly in relation to prospective studies. Claims data also include outpatient drug information for patients younger than 65 years and, in some instances, for patients aged 65 years or older. This is very useful for studying drug outcomes and drug safety. An added benefit of using claims data is that it precludes any imposition on the patient, physician, or other provider.
However, claims data are affected by certain biases that may compromise the internal validity and, thereby, the robustness of the data.
The most important benefit of using claims databases to analyze clinical and economic outcomes is ease and convenience. The need to examine clinical, economic, and humanistic outcomes usually is limited by practical considerations, such as financial and time constraints, as well as concerns about patient privacy. Given these practical realities, the use of a claims database for some or all data collection offers an attractive alternative. Claims databases offer a number of important advantages for conducting health outcomes research. As mentioned, unlike RCTs, they reflect routine clinical “real world” practice. RCTs include carefully selected populations of particular ages and disease severity with few or no comorbidities. In addition, the procedures and protocols are not often representative of routine clinical care. Patient compliance typically is greater in RCTs than in the “real world” because of the support services available to treat adverse effects and the tendency of RCT participants to be more compliant than the population at large. In addition, they are unobtrusive and relatively inexpensive to use once the information system is in place. Further, databases provide a timely means of analyzing a problem. Answers can be found in days or weeks, rather than months or years. Finally, databases offer a great deal of flexibility. Rare diseases or specific subpopulations can be researched, or a problem can be approached in a number of different ways.
Claims databases allow for the measurement of clinical and economic outcomes (e.g., hospital and emergency room visits). Beyond such high-level outcome measures, the availability of the diagnosis, procedure, and revenue codes allow for further specification of a patient’s outcome. ICD-9-CM (International Classification of Diseases, 9th Revision) codes provide diagnostic information allowing for identification of patients with a particular diagnosis or combination of diagnoses. Physicians’ Current Procedural Terminology, 4th Edition (CPT-4 codes) identifies procedures that are used to bill physician and other professional services. For example, CPT-4 codes could be used to determine whether a depressed patient received hypnotherapy. The Healthcare Common Procedure Coding System (HCPCS) can be used to provide further information on physician and non-physician services that are not included in the CPT-4, such as whether a patient obtaining care in a physician’s office for asthma received an injection of epinephrine.
The processes of care also can be assessed from a claims database. For example, the number of outpatient physician visits might be considered a good measure of the quality of care received by hypertension patients. Procedure codes allow for the measurement of additional processes of care such as whether or not atrial fibrillation patients are receiving annual electrocardiograms or electrical cardioversion. A typical example of using medical databases for human papillomavirus (HPV) vaccineassociated studies would be to get a preliminary estimate of the burden of cervical cancer within a particular region. One such study by Watson M et al used multiple databases to estimate the burden of cervical cancer in the United States. This study used data from two federal cancer surveillance programs, the Centers for Disease Control and Prevention (CDC)’s National Program of Cancer Registries, and the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) Program to estimate cervical cancer incidence among different sub-populations. Identification of the study patients through diagnosis codes obtained in medical databases, incidence and prevalence rates among different age populations, race and gender mix, and across various geographical regions11 can be easily accomplished through such databases. Another example would be a study examining the cervical cancer incidence before the HPV vaccine was introduced in the United States market. Patients who are provided HPV vaccines for prevention of certain cancers and those who are not could also be studied to evaluate the incidence of future complications and associated total health care costs through most medical databases that provide clinical and economic data. However, most measures of the structure of care are not found in the database itself but within the patient benefit manual or other records held by the managed care organization (MCO). Important examples include copayment amount, formulary coverage of specific drugs, prescription quantity limits, and limits on mental health benefits.
Although databases offer a number of advantages for conducting outcomes management, they are not without their limitations. It is widely recognized that the diagnosis found in databases is not always valid or reliable. While some overcoding does occur, in most cases undercoding of actual diagnoses is more common. Undercoding is an even bigger problem with chronic diseases, which are notoriously underreported. The principal finding in the Kern study was that identification of veteran diabetes patients with comorbid chronic kidney disease with a low glomerular filtration rate was severely underreported in Medicare administrative records. Similarly, the Icen study found misclassification of patients diagnosed with psoriasis. Several potential reasons for this misclassification would include the psoriasis diagnosis being differential in initial and follow-up physician visits, wrong initial diagnosis followed by actual psoriasis treatment, and the use of a nonspecific psoriasis code that does not specify the type of psoriasis. Given these limitations, it is helpful to know for which disease states the coding is insufficient, calling for a review of the medical record. Unfortunately, there is no published research to provide guidance on this issue.
Another important consideration is patients’ severity of illness. The goal often is to compare the outcomes of care for persons receiving different treatments or receiving care from different types of providers. Zhao et al. used a claims database to analyze the prevalence of diabetes-associated complications and comorbidities and its impact on health care costs among patients with diabetic neuropathy. This study identified the various complications and comorbidities through diagnosis codes and health care costs in the claims data. However, there may be important differences in the patients being compared that cannot be measured or controlled when using the information in the database. Other significant indicators of a patient’s disease severity, including smoking status, alcohol consumption status, laboratory values, and results of other diagnostic tests, are sometimes not available for analysis in the database. Pharmacy use described in the claims databases usually provides information about prescription medications. However, over-the-counter medications that are being used are generally not captured in such databases.
5.2.1 Description of Claims Database Files
Medication or claims databases usually have several files that characterize different patient settings where care is provided. These include, among others, inpatient, outpatient, emergency room, and pharmacy (medication) files. The outpatient file, for example, contains final action claims data submitted by institutional outpatient providers. Outpatient claims provide detailed information on the date of service, site of service (e.g., home care), provider specialty, type of service, and reimbursed charges. These variables allow us to calculate the frequency of health care utilization and its respective cost. Among several variables listed in outpatient files, the variables that are discussed in this data file are date of service, amount billed and amount paid, and provider information. Each outpatient visit record in the outpatient file usually includes the following information: date of visit, whether the respondent/patient saw a physician, type of care received, type of services received, medicines prescribed, flat-fee information, imputed sources of payment, total payment, and total charge, among others.
Similarly, claims data for hospitalizations can be an extremely valuable source for evaluating health outcomes in terms of incidence and frequency of hospitalization episodes, severity of the hospitalization episode in terms of length of stay and hospitalization costs. Inpatient claims data are also useful to assess the hospitalization costs associated with a condition or disease in a population. For each claim during a hospitalization episode, the file contains fields such as patient identificatioumber, provider number, ICD-9 code of diagnosis for which the service was provided, CPT code for procedures and services provided, Diagnosis-Related Group (DRG) codes, date of hospital admission, date of discharge, location of service (outpatient, emergency room, or inpatient), total amount billed, and total amount paid.
The prescription drug file in a claims dataset contains useful information on medications prescribed and taken by patients. Information is captured when the patient fills the prescription and a claim is then filed by the pharmacy. Importantly, the primary focus of the claim is the fill transaction; claims will show the activity of when the fills occur, but they will not show whether the patient actually took the medications. Thus, while claims serve as a proxy for compliance and adherence due to their ability to show fills, primary research may be used as an adjunct to determine if the patient actually used the medications when at home. Each record in the prescription drug file represents one reported prescribed medicine that was purchased for a particular episode. Only prescribed medicines that were purchased for a particular episode are usually represented in this file. Medication refills are also usually captured in this file, which allows for tracking medication usage by the patient longitudinally. The typical descriptors for medications on record include an identifier for each unique prescribed medicine; detailed characteristics associated with the event (e.g., national drug code (NDC), medicine name, etc.); conditions, if any, associated with the medicine; the date on which the person first used the medicine; total expenditure and sources of payments; and the types of pharmacies that filled the household’s prescriptions.
Similarly, information provided by the emergency room visits file includes date of the visit, whether the patient saw the doctor, type of care received, type of services (i.e., lab test, sonogram or ultrasound, x-rays, etc.) received, medicines prescribed during the visit, cost information, imputed sources of payment, total payment, and total charge.
5.3 Electronic Medical Records and Medical Charts
5.3.1 Medical Chart/Medical Record in General
Amedical chart or a record is a confidential document that contains detailed, comprehensive, and current information about a patient’s health care experience, including diagnoses, treatment, tests, and treatment responses, in addition to other factors that might play a significant role in his or her health condition. This document summarizes the overall collected information of an individual related to health status. Once a patient enters a health care setting, be it a hospital or a clinic, documentation in a medical chart or record begins. Different medical settings follow different types of such documentation practices; however, there are certain aspects of such a document that remain universal.
Some of the most common entries in a medical chart or record include the following: admission information, medical history and physical information, medication and treatment orders, medications and other treatments received, procedures, diagnostic and other tests, insurance, consultations, patient consents, and discharge information. Documentation in the chart or record is usually done by the physician or the nurse.
5.3.2 Electronic Medical Records (EMRs) or Charts
With recent advances in technology, written medical charts or records are gradually being converted to computerized or electronic versions. The electronic version, similar to the paper version of the medical record or chart, serves the same purpose of communication and documentation of an individual’s contact with a health care provider and the decisions made by the provider regarding the patient, including diagnoses and treatments provided.
5.3.2.1 Advantages/Disadvantages
Several advantages of EMRs over print medical records or charts could recommend their use by a medical institution. These include ease of chart or record accessibility, reduction of medical errors and task automation, legible medical notes, continuity of care and accountability, availability of an organized chart, and increased security. Other advantages include patient report generation for certain screening methods, including mammography and cholesterol screening, patients taking medications that have been recalled, computerized practice or treatment guidelines that can be easily accessible, adequate alert systems that would notify the health care provider about certain adverse results that require prompt action, improved documentation and care management, and potential cost savings. However, certain disadvantages of EMRs also should be noted. There have been instances where a patient’s laboratory and other clinical data have not been integrated with the computerized system. This affects the comprehensiveness of the medical record, as key elements pertaining to the patient’s health are missing. Efforts must be made to integrate all detailed and pertinent patient information. Another significant disadvantage would be system crashes during a patient visit that render unavailability of patient information during that period. Appropriate measures should ensure adequate back-up measures in the event of such crashes or system malfunction.
5.3.2.2 Current Use of EMRs
Though EMRs show potential benefits for healthcare organizations to adopt them into their systems, according to a recent study, only 4% of U.S. physicians have had access to an EMR system. Moreover, primary care physicians and those working in large groups are more likely than physicians in other medical specialties and smaller size practices, respectively, to use EMRs. In another study that researched the use of EMRs in ambulatory care practices in the state of Massachusetts, only 18% of the surveyed office practices reported using one. Some prominent reasons for this low uptake of technology include, among others, the significant direct and indirect cost for licensing the EMR software. Indirect costs include staff training to use the software and system maintenance. Cost is also a factor points to the fact that large physician practices have greater financial and technological resources than smaller practices and solo physician practice and, thus, the higher adoption rate of technological advances, including EMRs in large practices. Other factors include data entry obstacles, lack of trained staff, lack of uniformity, legal issues, and patient confidentiality and security concerns. Similarly, another study found a higher adoption of EMRs among physicians owned by health maintenance organizations (HMOs).
Some specific examples of how EMRs have been used as databases to provide insights into various therapeutic areas are provided below. The main advantages of using EMRs as databases to conduct pharmacoeconomic analyses include the richness and comprehensiveness of the data to estimate prevalence, incidence, physician treatment patterns, and cost of various prevention and treatment strategies available to medical practitioners. One example would be a study that estimated the tobaccouse prevalence using EMRs. The availability of data needed to analyze the study objective eliminates the need to do expensive multiple surveys of different sub-populations to get the needed answer. This particular study used the EMR database of a large medical group in Minnesota. The study showed that out of the overall included population, 19.7% were tobacco users during the year March 2006 to February 2007, of which 24.2% were aged 18–24 years, 16% were pregnant women, 34.3% were Medicaid enrollees, 40% were American Indians, and 9.5% were Asians.
Another study used an EMR to analyze associations between cardiometabolic risk factors and body mass index based on diagnosis and treatment codes. This particular study used the General Electric (GE) Centricity research database, which is a rich source of data used by more than 20,000 physicians to manage about 30 million patient records in 49 states. The availability of data, including clinical data captured in the practice setting, such as diagnoses, patient complaints, medication orders, medication lists, laboratory orders and results, and biometric readings, was a significant factor in the appropriateness of this dataset for the particular study. The Kaiser Permanente EMR was used to evaluate the complications associated with dysglycemia and medical costs associated with non-diabetic hyperglycemia. The EMR database used for this study provided information on all inpatient admissions, outpatient visits, pharmacy medication dispenses, and results of laboratory tests. As the study was based on diabetes patients, clinical information on isolated impaired fasting glucose (available in the database) was the primary factor used in classifying the study diabetes patients. The study found that more than half of the studied dysglycemia patients had at least one associated complication as compared with only 34% of normoglycemic patients (p<0.001). The study also found that macrovascular and microvascular complications had an incremental annual cost of $3,863 (p<0.0001) and $1,874 (p<0.0001) for dysglycemic patients versus normoglycemic patients. A final example would be a study evaluating the acceptance of HPV vaccine by gynecologists in an urban setting.29 This study found that the overall vaccinetion rate was 28% (6%–55.8%) for the initial 3-month period when the vaccine became available to the health plan.
5.4 Patient Reported Outcomes
A patient-reported outcome (PRO) is a measurement and assessment of a patient’s health status coming directly from the patient rather than from a physician or any other health care provider. The Food and Drug Administration refers to a PRO as any report coming from patients about a health condition and its treatment. An important feature that differentiates a PRO from any other measurement is that the measurement is done directly from the patient. A PRO thus provides a patient’s perspective on treatment effectiveness, adverse events, etc. Health-related quality of life (HRQoL), a term closely related to PRO, specifically refers to measures that are not only patient reported, but also include the impact of the disease and its treatment on the patient’s well-being and functioning. A PRO measure includes various facets of disease treatment and its effectiveness as reported directly by the patient. These include, among others, reports of symptoms such as pain, fatigue, physical functioning, and well-being in the physical, mental, and social domains of life. Many health behaviors, including use of tobacco and alcohol, participation in exercise programs, etc., are also included in a typical PRO. Other end points captured in a PRO include patient preferences for a particular treatment and treatment satisfaction. A PRO measure can include patient satisfaction with treatment, medication adherence, and other aspects of disease treatment, functional status, psychological well-being, and health status in addition to HRQoL.
5.4.1 Use of PRO Instruments in Pharmacoeconomic Studies: Focus on HPV Vaccine Studies
Although PROs usually consist of specific health-related questionnaires or instruments, providing a simple survey questionnaire for patient response also makes up a simpler form of PRO. This section provides examples of how such PRO questionnaires have been used in HPV vaccine-related issues and studies. Gerend and Magloire assessed the awareness, knowledge, and beliefs about HPV in a racially diverse sample of young adults. The authors used a survey to get respondentreported responses among 124 students aged 18–26 years from two southeastern universities. The survey assesses demographics, sexual history, awareness and knowledge of HPV, HPV-related beliefs, and interest in the HPV vaccine (women only). This study reported some interesting findings that could be used for further economic studies on HPV vaccine, including great knowledge of HPV, greater awareness among women of HPV as compared with men, and a greater interest in HPV education among blacks and sexually active respondents. Another study examined the stage of adoption of the HPV vaccine among college women aged 18–22 years at a New England University. This study used an online survey as a means to complete the PRO instrument. The survey examined knowledge of HPV, perceived susceptibility, severity, vaccine benefits or barriers, and stage of vaccine adoption. The use of such PRO measures provides a useful means to get responses directly from patients (in this case, women) who have used HPV vaccine or have potential to use one in the future. The analyzed results indicated that the acceptance of the vaccine was high among the study respondents and that the importance of Pap smears was also high. Yet another study analyzed the acceptance of HPV vaccine among mid-adult women. This particular study used a convenience sample of 472 mid-adult women who completed a survey that examined the demographic, knowledge, and behavioral variables associated with HPV vaccine acceptance. This study assumes clinical significance, as some of the variables that were found to be associated with vaccination among the study respondents could be useful to clinicians to identify potential female patients who might be more receptive to the vaccine. These variables included women who were younger than 55 years, had had an abnormal Papanicolaou test, understood the association of HPV and cervical cancer, and those who felt at risk for HPV infection.
Though HPV-related diseases are more common among women, men are also exposed to the virus in varying forms and severity. A study similar to the Ferris study based on women examined the variables associated with HPV vaccine acceptance among men. Similar results were obtained from this study in that the (male) respondents with a higher education and knowledge about HPV were more likely to accept HPV vaccination than others.
5.5 Alternative Population-Ba sed Da ta Sources
As mentioned in Table 5.1, numerous datasets are available either commercially or from the U.S. government. These include:
1. Thomson/Medstat
• MarketScan claims database
• Hospital/drug database (formerly Solucient)
MarketScan contains claims and encounters data representing commercially insured, Medicare supplemental (Medigap), and Medicaid patients. It covers approximately 17 million lives in any given year. Longitudinal tracking, across health plans and across payers, is possible. Subsets of patients may be linked to laboratory test results. Approximately 350,000 discharges have been linked between the Hospital/drug (inpatient) and MarketScan (outpatient) databases. Intensive care unit (ICU) length of stay (LOS) is also available.
2. IMS/PharMetrics
The PharMetrics Patient-Centric Database comprises medical and pharmaceutical claims for a very large number of patients from more than 90 health plans across the United States. The database includes both inpatient and outpatient diagnoses (in ICD-9-CM format) and procedures (in CPT-4 and HCPCS formats), as well as both retail and mail order prescription records. Available data on prescription records include the NDC code, as well as quantity dispensed. Charges, allowed and paid amounts are available for all services rendered, as well as dates of service for all claims. The inpatient data are less comprehensive than the Thomson Hospital/drug database, as drug-specific data and ICU LOS are not available. However, full Medicare data are available.
3. Medicare Datasets
Available from the Centers for Medicare and Medicaid Services (CMS), a benefit of using the Medicare databases is that they include inpatient and outpatient data for most U.S. hospitals, with the exception of VA (Veterans Affairs) and military hospitals. These data are readily available for transformation to a usable form for comparative purposes. A limitation is that they are primarily constituted by an elderly sector of the population (approximately 40 million patients), so are not generalizable to younger populations.
There are several types of encrypted general-use Medicare datasets, available in 5% or 100% segments, which are described below:
• LDS (Limited Dataset) Standard Analytical Files (SAFs): contain payment information for each institutional (inpatient, outpatient, skilled nursing facility, hospice, or home health agency) and non-institutional (physician and durable medical equipment providers) claim type
• LDS MEDPAR (Medicare Provider Analysis and Review) Files: contain inpatient hospital “final action stay” records, summarizing all services received by a patient from admission through discharge
• LDS Denominator File: contains demographic and enrollment data about each beneficiary in the Medicare and Medicare Managed Care Organizations
• LDS Outpatient Hospital Prospective Payment System (PPS): contains select claim level data from the Hospital Outpatient PPS claims
4. Geisinger
Data from Geisinger hospitals and physicians, both general practice and specialists, comprise data available from MedMining, a Geisinger Health System business. It has 7-plus years of longitudinal, full clinical data on over 3 million patients and 10-plus years of lab results that are captured electronically in an electronic health record, as well as other clinical, financial, and administrative systems. The data include an associated reason code for every prescription. Dispensing information and drug cost are not available. Being from hospitals and community-based physicians throughout rural Pennsylvania, the data may not be generalizable to all U.S. patients. In addition, patient co-payment information is not available.
5. Cerner
Cerner Health Facts™ contains inpatient and hospital outpatient data on over 12 million patients; the Cerner dataset also contains lab results data. However, no longitudinal (claims) data are available from community-based outpatient settings.
6. Premier
Premier’s Perspective Hospital Database is a comprehensive hospital utilization database that includes patient-level clinical and financial data reflecting an accurate national representation of the U.S. hospital market in over 6 million patients. It contains hospital drug data and some patient subsets have inpatient and outpatient hospital data; laboratory test results are currently not available. Premier also can combine its inpatient records with i3 Innovus, which provides an integrated database of enrollment, inpatient and outpatient medical claims, pharmaceutical claims, and laboratory results. However, this database combination appears to be proprietary.
7. Ingenix/IHCIS
The former IHCIS business, now part of Ingenix, constructs a database comprising commercial plan payers and contains a large number of Managed Medicare beneficiaries. Although it has data for approximately 3 million lives from community-based labs, there are no patient co-pay data or original paid amounts, as they have standardized, rather than actual, financials.
8. General Practice Research Database
The General Practice Research Database, or GPRD, dataset is a near-complete electronic record of all care for 5.5% of the United Kingdom, containing more than 3 million active patients. The most current format of GPRD is termed FF-GPRD. Because these are data collected by general practitioners (GPs), while the community data are very detailed (labs, medications), the hospital data are not very comprehensive.
9. THIN (The Health Improvement Network)
For The Health Improvement Network, or THIN, data collection commenced in January 2003, using information extracted from Vision, a widely used general practice management software package developed by In Practice Systems. The database is regularly updated and currently contains data on over 5 million individuals living in the U.K. THIN was developed as a replacement for the GPRD, because the EPIC version of the GPRD was discontinued from April 2002. Meanwhile, the GPRD is maintained by the U.K. Medicines and Healthcare Products Regulatory Agency (MHRA) in London. THIN’s pluses and minuses are the same as GPRD.
10. General Electric Centricity Research Database
GE Centricity is an EMR. The database comprises de-identified electronic patient records from users of the EMR software and currently consists of data from over 8 million unique patients. A potential positive to this database is the availability of patient-reported over-the-counter medication use, while a potential negative to this database is its lack of inpatient data.
11. Framingham Offspring Study (FOS) Database
The Framingham Study is a longitudinal population-based observational study that began in 1948 in Framingham, MA. In 1971, a second-generation cohort was recruited into the Framingham Offspring/Spouse (FOS) study. Cohort members are examined in the clinic every 4 years, on average, where they undergo a standardized protocol for data collection approved by the Boston University Institutional Review Board. This database provides a rich source of information related to cardiovascular disease, including coronary heart disease, stroke, hypertension, peripheral arterial disease, and congestive heart failure.
12. Atherosclerosis Risk in Communities (ARIC) Database
Atherosclerosis Risk in Communities (ARIC) Study, sponsored by the U.S. National Heart, Lung, and Blood Institute (NHLBI) National Institute of Health, is a prospective observational biracial follow-up of 15,792 men and women between the ages of 45 and 64, recruited from Forsyth County, North Carolina; Jackson, Mississippi; suburbs of Minneapolis, Minnesota; and Washington County, Maryland. This database provides key clinical information on the etiology and risk factors associated with atherosclerosis, along with differences in medical care obtained by patients of different race and gender as well as those residing in different locations.
5.6 Issues and Challenges
Although numerous advantages exist with use of retrospective databases over RCTs, considerations of internal validity (reproducibility of results) and external validity (generalizability of results) must be addressed. For example, with RCTs, because they are protocol-based, it is relatively easy to reproduce the results of a trial of a hypertension drug using an identical protocol in a patient population following the same inclusion and exclusion criteria. With retrospective databases, however, confounding factors, such as a center effect or regional variation in the prevalence of hypertension, may limit the ability to duplicate these results between different populations, such as between two MCOs or even between two locations of the same MCO. However, the very measure that helps to ensure reproducibility, namely, the protocol, may reduce the study’s use in the real world, as any analysis would have to consider protocol-induced (artificial) resource use and costs. Generalizability refers to the ability to extrapolate results across health care settings or even countries. A pharmacoeconomic analysis must provide segregated healthcare resource units (e.g., numbers of MRIs) and costs per unit (e.g., cost of an individual MRI), so that if a resource is not used the same way in the United States and Canada or the costs are very dissimilar, each country can use the resource data, but customize it to its own cost structure. The caveat here, of course, is to determine whether the resource utilization itself is similar across the two countries.
To determine if a dataset is appropriate to answer a pharmacoeconomic question, key attributes of the population (such as demographics), covered services, benefit design (e.g., nationalized or private insurance, deductibles, patient co-payments), formulary design (e.g., open [allowing any drug], closed [allowing only specific drugs]) and any special programs (e.g., physician detailing, disease management initiatives) that might affect its generalizability should be enumerated. Johnson outlines a sixstep process for conducting outcomes analyses using administrative databases, as seen in Table 5.3.

Since practice, including available treatments and procedures, changes over time, it is essential to use retrospective data to continuously inform health policy decisions. An example of use of data from a pharmacy benefits management claims database to evaluate two decision-analytic models regarding the cost-effectiveness of therapeutic regimens to eradicate Helicobacter pylori in ulcer patients is a case in point. The authors found that model results overstated the costeffectiveness of the previously more cost-effective regimen and underestimated the cost-effectiveness of the other regimen such that the model assumptions and, ultimately, the outcomes, were not supported by the data.
Regardless of how the data are used, issues of data quality must be addressed. A checklist detailing many of these issues was published as a result of an International Society for Pharmacoeconomics and Outcomes Research (ISPOR) Task Force’s being convened to examine the quality of published studies using retrospective databases. It is important to have plans to examine a representative number or percentage of source documents (e.g., patient charts) to determine that diagnosis and procedure codes are reasonably accurate. For example, Fivenson, Arnold, and colleagues determined that approximately 10% of diagnosis codes in an atopic dermatitis study utilizing a claims dataset were inaccurate.3 Moreover, coding may change over time, such as use of different versions of the ICD-9-CM coding set, differing frequencies of use of codes according to reimbursement policies or varying regional codes (e.g., HCPCS codes). In a study to evaluate the coding data quality of the Healthcare Cost & Utilization Project (HCUP) National Inpatient Sample, claims data failed to identify more than 50% of patients with prognostically important conditions, and miscoding of diagnoses resulted ionspecific disease identification or coexisting conditions. Coding error rates were found to vary widely among states, hospitals within states, geographic location, and hospital characteristics. Coding errors were significantly different among patient demographic groups and whether the state used billing versus abstract data.
In addition, services may not be captured in the database because they are administered elsewhere (e.g., carved out, such as mental health services). It is important to minimize missing and out-of-range values, ensure consistency of data (e.g., no menopausal men), control duplication of records, assure continuous enrollment, ascertain the availability of the continuum of care, and make certain that data have been recorded uniformly because if there is inconsistency in coding, there is inconsistency in the resulting judgments derived from that data. Sax mentions the pharmacy field “days supply” as potentially problematic as an indicator of patient adherence to a medication regimen due to dose titrations (e.g., gradual reductions in prednisone “burst” during asthma exacerbation, unknown actual use, as-needed medications, and possible unknown sources of additional medication, such as from an unrelated pharmacy. As with prospective data collection, benchmarking values against established norms, such as the SF-36 for quality of life, will assure researchers that the data are representative of the population at large.
It is also important that data links across relational databases be consistent. For example, there should be unique identifiers for each family member. Many times, data must be concatenated (or joined) from several fields in a database to make sure that this is the case. Moreover, events may not be recorded at the same time that they actually occurred for the patient, as with provider charges occurring perhaps 6 months to a year after a procedure for a Medicare patient, so it is essential that this lag time is considered when evaluating an episode of care.
In addition, temporal factors may play a role in analyses using preexisting data, either in terms of hypothesis testing or as a confounder. For example, Arnold and colleagues used clinical trials, published literature, and a modified Delphi panel to establish the effect of timing of administration of a thrombin inhibitor, argatroban, on its cost-effectiveness in patients with heparin-induced thrombocytopenia (that is, heparin hypersensitivity). It is also necessary to define and identify disease-related costs. For example, in patients with asthma, should claims be related only to the various ICD-9 diagnosis codes for the various types of asthma or should there be the added requirement of an asthma medication or diagnostic testing sometime during the index or eligibility period? It is useful to be able to “tease out” costs during a hospitalization related specifically to the diagnosis of interest; however, this is ofteot possible because of potential overlap between the diagnosis of interest and concomitant illness, e.g., pneumonia in the case of asthma. It is also important to account for natural history of disease progression and medical and technological advances that may have impacted on the course of the disease in terms of the index date (beginning of data collection) and duration of data inclusion. Indeed, Motheral and colleagues discuss the idea of censoring or the time limits placed at the beginning (left censoring, period prior to initiation of therapy of interest) or end (right censoring, follow-up time) of the study period.
5.7 Statistical Issues
Bias is a significant problem that must be addressed. The types of biases include selection bias, measurement bias, length of measurement bias, misspecification bias, interdependence of observations, diagnostic ascertainment bias, autocorrelation, omitted variables, quasi-omitted variables, investigator bias, obsolescence bias, vintage bias (human and physical capital), claims vs. encounter bias and recall bias. The reader is referred to a lengthy review of these types of bias by Sackett and colleagues.
The previously discussed ISPOR checklist has categorized many of the statistical issues facing users of retrospective databases in general.4 These are reviewed below. The first is control variables. It is important to account for the effects of all variables so that biased estimates of treatment effects, or confounding bias, do not occur. For example, it is important to control for the likelihood of prescribing certain compounds given a patient’s history of comorbid conditions. Common approaches to adjust for confounding bias include stratification of the cohort by different levels of the confounding variables with comparison of the treatments within potential confounders, such as demographic variables; the use of multivariate statistical techniques; cohort matching and propensity adjustment. Multivariate regression can be used to estimate the association among the intervention, confounders, and the outcome of interest. Stratification divides the study population into subgroups on the basis of confounding characteristics to reduce confounding. With cohort matching, a comparator cohort is generated based on characteristics associated with confounding bias. A Chronic Disease Score or the Charlson Index can be used to control for comorbidities or disease severity, respectively. Moreover, instrumental variable techniques can be used to group patients by choice of treatment, but without unmeasured confounders.
Selection bias may be introduced by the inclusion and exclusion criteria used in the study design, especially considering that missing data, such as a diagnosis code, may cause records not to be chosen for analysis. Thus, the population selected may not be representative of all patients that should be included. A method that is frequently used to account for potential inherent differences in treatment assignment due to selection bias in retrospective databases is propensity scoring. The propensity score, defined as the conditional probability of being treated given the covariates, or the probability that a patient would have been treated, can be used to balance the covariates in the groups, thereby adjusting the estimate of the treatment effect. To estimate the propensity score, one models the distribution of the treatment indicator variables, considering the observed covariates. The propensity score is then estimated using logistic regression or discriminant analysis. Once estimated, the propensity score can be used to reduce bias through matching, stratification (subclassification), regression adjustment, or some combination of all three. All of these methods are an attempt to effect a “quasi-randomized” treatment allocation.
Since much data in retrospective databases is expected to be skewed in its distribution, techniques such as log-transformation and two-part models should be considered. Methods such as hierarchical linear modeling may be appropriate when using pooled data from several different health plans or multiple sites from a single health plan to account for center (that is, facility) effects.
Outliers are another issue that must be addressed in economic analyses using retrospective databases. As mentioned above and particularly true when using costs rather than the quantity of units, such as hospital days or physician office visits, to measure resource use, just a small number of outliers can greatly skew the analysis. Logarithmic transformations that have been used previously to reduce skewness can create difficulties with non-log-transformed costs. For this reason, it is often prudent to record unit costs and quantities separately and, if a high degree of skewness is present, use the quantities for the statistical calculation, then multiplying by a set dollar amount from a fee schedule.
5.8 Non-U.S. Countries
As with U.S. data sources, international retrospective databases encompass such sources as national insurance administrative data, hospital medical records, diseasespecific patient registries, and provider survey data. Table 5.1 contains two (U.K.) sources of such data from a study that qualitatively reviewed the methodological challenges of using non-U.S. databases to conduct retrospective economic and outcomes research studies. The researchers conducted a MEDLINE search to obtain a sample of literature published after the year 2000 on retrospective analyses incorporating non-U.S. databases using the ISPOR checklist and found that few economic studies included information on indirect cost components because of a lack of relevant data. Moreover, they found that the quality of non-U.S. retrospective database analyses varied, leading to problems of internal validity, that is, study design errors that could compromise conclusions. The economic datasets were from Italy, Australia, United Kingdom, Switzerland, Singapore, seven other European countries, Canada, Japan, and France. Only two of the 12 studies reviewed included indirect costs. Ten of the 12 economic studies reviewed made adjustments for confounders or sampling schemes (i.e., to reduce selection bias), typically with some form of regression model. The authors thought that five studies did not sufficiently address external validity. Sensitivity analysis was the most common approach to dealing with uncertainty in the studies. Five studies extensively discussed study limitations; however, all of the study authors, as well as the review author, advised caution regarding the external validity of the studies.
5.9 The Future in Use of Retrospective Databases
What is the future for use of retrospective databases to inform pharmacoeconomic analyses? Stallings and colleagues developed a decision-analytic model to test the likely cost impact of a hypothetical pharmacogenomic test to determine a preferred initial therapy in patients with asthma. They compared annualized per patient cost distributions using a “test all” strategy for a nonresponse genotype prior to treating versus “test none.” They found that the cost savings per patient of the testing strategy simulation ranged from US$200 to US$767 (95% confidence interval) and concluded that upfront testing costs were likely to be offset by avoided nonresponse costs. This shows the potential use of retrospective database studies in analytic data mining and improved hypothesis testing.
Indeed, there is an increasing likelihood that genomics will play a role in decisions about drug use. For example, a recent theoretical Markov model showed pharmacogenomic-guided dosing for anticoagulation with warfariot to be cost-effective in patients with nonvalvular atrial fibrillation. Interestingly, another recently published algorithm using logistic regression from international retrospective databases showed that incorporating pharmacogenetic information was more likely to result in a therapeutic international normalized ratio (INR), the major method of determining anticoagulation, than use of clinical data alone. However, the data used to inform the Markov model were published studies that did not include the latter study and the algorithm did not indicate the clinical diagnoses, nor the clinical outcomes, of the patients who were more or less likely to be within a therapeutic INR. Therefore, more research is needed to coordinate these two somewhat conflicting results. Indeed, another potential for the use of such easily available databases is to increase their use in validation studies. Testing the same hypothesis in several databases increases the validity of the study results, thereby increasing the credibility of the findings. However, in the near future, retrospective databases are more likely to continue being used for quick identification of treatment patterns, prevalence, and incidence of a medical condition, medication adherence, and persistence, and health care resource utilization and associated costs related to a particular medical condition. With clinical trials getting more and more time consuming and expensive, retrospective databases offer an attractive alternative to provide this “real-life” medical information.