Published 26 Aug 2008
Demonstrating Performance is written as a primer for expenditure reviews and lays out principles (conditions) that underpin the effective design and management of major policy programmes. A framework is provided for telling Ministers, managers and the public about the efficiency and value of baseline (and new) spending. Real examples of measures and results are used to illustrate the approach.
This paper lays out a framework for analysing the performance of major service and regulatory interventions. The framework is used in expenditure and baseline reviews, and to help agencies and the Government to assess and understand performance.
Leaders need performance reporting systems to support decision-making processes. Effective managers ensure that their major interventions work as they are meant to.
An agency’s major interventions must work to make its overall approach work. Major interventions are characterised by high leverage, cost, risk and/or value to the public. Major interventions therefore warrant careful design, delivery, analysis, and monitoring.
Ministers, boards and managers wanting timely reports on performance invest up front in design, data and reporting systems, and ensure reports feed into decision-making. Ex ante, major interventions worthy of significant effort or expenditure are expected to:
- address needs and aspirations that are relevant to our peoples and institutions
- have ends (outcomes and goals) and means (outputs and coverage) mapped to those needs and performance aspirations, and
- have systems in place to validate performance against the ex ante specification.
To justify ongoing funding, every major intervention should be supported by credible information showing (allowing for fundamental measurement constraints) that outputs:
- are produced efficiently and cost-effectively (eg, Public Finance Act 1989, s40)
- meet quantity and quality standards (s41)
- reach (s41) and positively influence the groups they are designed to affect, and
- demonstrably reduce needs and improve the outcomes used to justify funding (s40).
Reports on major interventions should summarise key results, and be timed to support leadership decisions on strategy, priority, output, capability and budget. This ‘primer’ shows ways in which New Zealand agencies have demonstrated performance of major interventions, using the criteria listed above. The primer shows different ways in which the performance of major interventions can be reported in accountability documents.
If performance reporting systems are weak, now is the time to improve them.
Fortune favours the prepared. Performance reporting systems should be set up ahead of delivery. They identify and gather data needed to demonstrate performance. Reports help leaders identify and resolve problems at an early stage, and to focus on results.
Where reporting on the performance of major interventions is considered inadequate, reports on Expenditure Reviews should recommend to Ministers that robust reporting systems get set up, and that specific, SMART measures are reported by set times.
A framework for telling Ministers, managers and the public about the efficiency and value of baseline (and new) spending. While the paper is not written to assess policy advice, well designed and effective policies will meet conditions laid out in this paper.
The paper is written for departments and Crown entities with major service delivery or regulatory outputs, and agencies participating in spending reviews or baseline reviews. The framework applies to social, economic development and environmental agencies.
During expenditure reviews, and as a normal business process, agencies are expected to demonstrate efficient management and effective intervention. In practice, this means reporting the delivery and outcomes of major interventions to external stakeholders.
Measurement informs ministers, boards and managers, and protects consumers. Graphs and tables in this paper show how Crown agencies reported on performance.
Major interventions warrant careful design, delivery, analysis, and monitoring. Results are assessed first and foremost to inform leaders and drive decisions. Under the Public Finance Act and other legislation, major results must be reported externally.
Focussing on results helps leaders to look critically at major elements of baseline spending, protect interventions that work, and change or exit poor interventions sooner. Every major intervention embodies a theory that activity and spending breeds results. Monitoring shows whether results occurred as predicted, and when change is needed.
Objective measures help Ministers affirm or reshape major interventions and budgets. This paper outlines a ‘logic’ approach to demonstrating performance from interventions, and clarifying the value of baseline spending. In applying this approach, you should:
- Start by looking at 1-3 interventions that dominate the budget, strategy and SoI.
- Over time, apply and adapt the framework to manage all major interventions.
- Show major interventions are relevant, economical and effective (or change them).
- Show relevance by establishing ongoing need or opportunity, establish the gains expected from major outputs, and measure those gains.
- Do not assess everything. Focus on 3-5 major results, and reducing uncertainty.
- Build management systems to inform decision making, particularly where limited information is now available to demonstrate performance and improvement is feasible.
Leaders ensure concise, summary reports get delivered as decisions get made.
Pre-condition #1: Address Needs and Aspirations That Remain Relevant
Effective policies are grounded in real community and government needs, and have goals, outcomes and outputs relevant to those needs. Proponents must link the goals, scale and design of major interventions to real needs.
Major expenditures which are not grounded in today’s needs should be queried. The onus is on the agency receiving funding to show ongoing need using the most objective means available, or to provide a credible explanation of why this was not done.
Where measurement is feasible, agencies should show the distribution and scale of the need, and how this drives the goals, design, coverage and scale of interventions. Targets may be set when needs are specific, tangible and measurable.
Need is often assessed by comparing outcome indicators or looking at demand, eg:
- Benchmarking to establish and compare levels of need, or establish whether there is real scope for improvement (eg, victimisation rates; incident rates; costs).
- Disaggregation to see who - or what - needs support or observation (eg, is at risk).
- Risk assessment to prioritise access, quantify the costs of not intervening and, in retrospect, refine targeting criteria (eg, for road safety, regulatory and health work).
- Forecasting to predict changes in demand (eg, schools; investigations; benefits).
Results often paint simple pictures of where major effort may (or may not) be warranted.
- Top 10 Avoidable Causes of Disability and Death (MoH, 2001)
But remember ‘needs’ depend on how data is interpreted, and data can be used badly:
- Needs must be defined in terms of outcomes (not as ‘improved’ outputs or delivery).
- ‘Needs’ without effective solutions are not priorities - outputs must improve lives!
- Poor ability to target effective outputs on areas of need will limit value-for-money.
- Check comparisons are valid. (Could results be skewed by culture, wealth, etc?)
- Distributions matter. (Averages conceal equity issues and targeting choices).
- In this paper ‘needs’ include the need to pursue opportunities, as well as the need to remedy problems
Pre-condition #2: Clear Ends (Outcomes) and Means (Outputs and Coverage)
Basic logic checks provide critical information, and are essential when inadequate data is available to confirm performance. Four basic checks are common:
- Outcome goals must match the needs identified in #1 (above). The best goals state where (or for whom) and by how much specific, measurable outcomes will improve.
- The major outputs must be supported by evidence – or at least robust expectation – that they will improve core outcomes in the areas specified.
- Outputs must be clearly specified, if only to help managers meet quality standards.
- Coverage should be limited to areas or groups with the needs identified above. (But coverage may well be narrower, eg, due to responsiveness or to logistic barriers.)
Logic checks tests are based on ‘results chains’. Results chains set out in stark terms what must happen before the outcomes sought can eventuate. Results chains provide a succinct summary of the policy’s design, make its goals explicit, and show how it is meant to work (Appendix 1). If the design appears weak, improvement must precede funding.
Results chains provide the specification and measures needed to verify performance. A clear results chain allows leaders to surface assumptions, and confirm funding is warranted. But remember, logic checks are only as good as the underlying theory:
- Complexity can mask major problems (“If you cannot understand it, don’t buy it!”).
- Superior outputs may exist (a separate policy process is needed to look for these).
- Vital assumptions can be omitted (reveal them by looking from many perspectives).
- The ultimate logic check is ex post proof of effectiveness (i.e. fact trumps theory).
Pre-condition #3: Systems Are in Place to Verify Performance
Departments and Crown entities must declare how they will report on performance in Statements of Intent. Annual reporting intentions must be declared in the information supporting the Estimates. Results chains lay out the results expected, and help identify appropriate measures for major interventions. Failure at any step in a results chain can signal non-performance, flag risk, and drive managers to make improvements.
(Crown entities are subject to similar requirements under s.141 of the Crown Entities Act.)
Ideally, effectiveness is proven by measuring impact. Even when impact is measured, other indicators are often reported to speed feedback up and to build a richer ‘performance story’. A typical system for demonstrating performance will thus:
- Require proponents to lay out how the intervention is meant to work (#2, above).
- Identify what results should be visible and measurable if the intervention is working.
- Show how results will be measured (i.e. specify measures and comparison groups).
- Be implemented in advance, so that data can be collected as intervention occurs.
A robust system for managing and demonstrating performance will assess whether:
- Inputs were used efficiently (or, ideally, cost-effectively).
- Outputs were delivered in the right quantity, without compromising on quality.
- Outputs reached those in need who are likely to respond to intervention (coverage).
- Core outcomes improved by the quantum expected, and in the groups ‘treated’.
- Intermediate and near-term outcomes are reported (as well as end outcomes) to hasten feedback, confirm output quality, improve attribution and pinpoint problems.
- Superior performance has been achieved by other jurisdictions, agencies or means.
What gets measured depends on what must be assessed (Appendix 2; table below). This includes key aspects of performance, and risks of unintended consequences.
|Type of Result||Class||Focus On||Examples of Common Measures|
|EFFICIENCY OF PROCESS||Input||
Real output price trend (inflation adjusted)
Price per unit, vs. benchmarks
% prison beds full / max. capacity used
Trend in real price (eg, per cop or nurse)
People receiving training / rehabilitation
Cases / complaints processed
Quality of delivery
% output fully meeting specification
% ministerials / passports / etc on time
% who would use again / recommend use
% population in need receiving output
% in ‘treated’ group who met entry criteria
% targets who did not access / use service
Time in queue (or other ‘big’ barrier to use)
Reduction in queue
Receipt of benefits
% finishing / getting qualified / in service
% core messages remembered
Average wait time / number in queue
% impoverished with more money
% believing regulatory change matters
Higher incident or reduced survival rates
% aware of risks / able to use new idea
% investing / saving / quitting / working
Fewer drunken drivers / ‘bad’ incidents
% in jobs / new career / crime free
% alive after 30 days / time event-free
Graduates migrating or excessive uptake
|END or FINAL||Outcome||
More good stuff
Less bad stuff
Greater health / wealth / happiness
Less difference across deciles / areas
Fewer deaths / accidents / kids in care
Cost per unit of improvement in outcome
Increased welfare dependency, risk, etc
Remember: ‘major interventions warrant a major effort to demonstrate performance’:
- Demonstrate ongoing need, efficiency, good delivery, coverage and outcomes.
- Credible performance stories tie an intervention logic to key performance measures.
- Focus on robust measures that help leaders reduce uncertainty about performance.
- You cannot measure everything. But the literature shows what can be measured.
- Repeat measures may be required before managers accept a problem is real. Triangulation helps.
Condition #4: Produced Cost Effectively and Efficiently
|Type of Analysis||Measurement of Benefits|
|Cost-Minimisation||Benefits found to be equivalent|
|Cost-Effectiveness||Physical units (e.g. life years gained)|
|Cost-Utility||Healthy years (e.g. quality adjusted life years)|
- Cost-effectiveness of PHARMAC investments each year, 1998/99 to 2004/05
The Public Finance Act requires departments to identify measures that will be used to report cost-effectiveness. The way benefit is measured will favour different methods of economic analysis (right).
Economic analysis will show whether a given intervention is worth funding, and can show whether it better than other options. Measured regularly over time, economic ratios show how – or whether - overall efficiency and value have improved (right).
Economic measures must thus be reported if benefits and costs are linked in a robust way. When this is not possible, cost-effectiveness can be inferred by benchmarking or real price analysis (below), and proving that the intervention works as intended (see #5-7).
An efficient producer maintains or reduces prices, after inflation is allowed for. As efficiency improves real prices for homogenous inputs and outputs fall (and vice versa).
- Real Price Per Unit of Output, 1995-2004
Real price analysis requires time series information on price and volume for major:
- outputs (eg, cases, passports, patients or children processed), and/or
- assets (eg, km of road, classroom or prison bed built), and/or
- inputs (eg, per cop, fireman or analyst).
A spreadsheet for exploring price and volume data is on the SSC’s Expenditure Review Portal. The results on the right show periods when the agency showed it could manage costs downwards (green arrow), and periods when costs rose much faster than inflation.
Remember: a 1% price increase in major Votes costs ~$200 million, without adding value.
- Good managers manage costs: look for efficient production and economies of scale.
- Quality alone does not justify a price rise: improved results must also be shown.
- Utilisation rates also reveal efficiency (eg, cases per worker; % houses occupied).
- Inefficiency is implied by year end spends, and persistent or major under-spends.
- Improved quality usually shows as step changes in prices, not slow upwards drift.
Cost-effectiveness cannot be inferred from efficiency data alone. Agencies must show that their major interventions are also effective (see Conditions #6 and #7, below).
- Treasury staff should read TSY #855600, which lists ‘tips & traps’ for using the spreadsheet.
Condition #5: Delivery Meets Quantity and Quality Standards
Good delivery makes good results possible. Effective delivery must be demonstrated. Both the Public Finance Act (s41) and Crown Entities Act (s142) require agencies to describe outputs, provide measures and standards, and report on delivery.
Three dimensions of delivery can be reported against the output specification (plans) for major interventions:
- Coverage (discussed separately in #6).
Quantity measures are widely reported in New Zealand. This brief document will not review their use.
A far greater challenge is to show that delivery met expectation. Without assurance on output quality, users and stakeholders may refuse to use services, compliance and claimed benefits are less likely or likely to be reduced, efficiency is un-provable, and value-for-money is reduced.
Quality can be assessed in three main ways:
- Against surrogate measures, typically intermediate outcomes, drawn from the logic model (eg, completion rate; readmissions; 90 day survival rates; user satisfaction).
- Against delivery attributes in the output specification (eg, timeliness; checklists).
- Process evaluation (where quality criteria are poorly specified or hard to specify).
- HNZC Property Condition Benchmark
(properties meeting it)
- Mortality Rates Down – Quality Inferred
(National healthcare quality report 2005, USA)
Remember that delivery measures only show delivery met specification or expectation:
- Surrogates provide good, cost-effective measures as: (a) intermediate outcomes provide some assurance that interventions worked, (b) meticulous documentation and review of outputs are not needed, and (c) results are valued by delivery staff.
- Direct measures of delivery quality may be costly, or may be resisted by staff.
- Process evaluation should establish quality measures for reporting into the future.
- Claims of efficient cost management (#4) are only credible if quality was maintained.
Condition #6: Reach and Influence the Groups that Interventions Must Affect
Maximum benefit occurs when interventions reach and change those most in need. Failure to deliver output on target flags a major performance problem. Even if output was delivered well in all other respects, impact will be reduced if coverage is poor.
Coverage really matters. Producing output does not improve outcomes. Output must reach the groups or area where needs exist (see #1). Eg, to protect victims, regulation must affect those who are not inclined to self-regulate. Literacy programmes must reach the illiterate, education must reach the uneducated, food reach the hungry, etc.
Coverage is crucial in assessing interventions designed to promote equity or access. Coverage is often critical in risk management (eg, road safety; preventative) outputs.
The good news is that coverage can be assessed easily, cheaply and very quickly, if:
- output specifications clearly identify who should receive what (‘conditions of entry’)
- records are kept of who did receive output, and how they met entry conditions.
When coverage is good, leaders know quickly that delivery is on target, and a major prerequisite to achieving results has been met. When coverage is variable or poor, analysis may help leaders identify where, and ways in which delivery can be improved.
Management objectives shape how coverage is reported. Common measures show:
- how many receiving output met entry criteria (objective: allocative efficiency)
- how many meeting entry criteria actually received output (rightsizing; access)
- numbers in queues exceeding different levels of need (rightsizing; rationing).
Remember that improving output distribution can improve outcomes in critical areas:
- Coverage of interventions should be clearly stated and explained in specifications.
- Arguments for broader (and tighter) coverage depend on need and proven effectiveness.
- Scope may exist to manage costs using tighter allocative principles and entry criteria.
- Good coverage is a precondition for both effectiveness and cost-effectiveness.
- The literature details several simple ways of reporting on Type 1 and Type 2 coverage issues, e.g. for the UK’s Sure Start family assistance programmes (www.beacon-dodsworth.co.uk/allocative efficiency.pdf)
Condition #7: Demonstrably Reduce Need by Improving Outcomes
The ultimate and only reason to fund an intervention is to improve outcomes. Impact measures validate interventions by showing how outcomes improved, and for whom. Ongoing funding is warranted only while there is every expectation – preferably a level of proof – that the results promised by its proponents were achieved.
Impact is gauged against goals set in #2, using the same measures used to establish need for intervention (#1). Intermediate outcomes specific to the intervention are often assessed in parallel. Impact is assessed by assessing outcomes for situations in which the intervention was, and was not, delivered (see Appendix 3 on comparison groups).
Two main methods are used to demonstrate improved outcomes:
- Impact is assessed directly when end outcomes are highly measurable, and robust comparison groups allow us to gauge the reduction in need due to the intervention.
- Near term results and intermediate outcomes are assessed to confirm interventions at least induce some changes predicted by proponents of the intervention. (This method gives fast feedback and triangulation, and improves measurability and attribution.)
Using both methods often improves the results’ credibility with decision-makers:
- End outcomesjustify ongoing delivery (but give little insight into how to improve).
- Intermediate and near-term outcomes give early, more specific warnings of issues.
- Multiple measures build support when individual measures are weak or challenged.
- Multiple measure help test the logic model (and may suggest other approaches).
- With output information, results help leaders gauge problems and how to respond.
|Fatal accidents||Down 20-28%|
|Hospital bed-days||Down >22%|
|Safety related tickets||Up 10-400%|
|Exceeding 100 km/h||Down 14%|
|Speed related deaths||Down 8-14%|
|Rear seat belts worn||Up 40%|
|Deaths avoidable by belt||Down 28%|
|Breath tests||Up 23%|
|Drunk in charge (per test)||Down 30%|
- Results Show Success AND Help Managers Improve Coverage
- Leaders must pay special attention to results that challenge ‘conventional wisdom’.
- Acknowledging (or anticipating) the need to change often works better than denial.
- A decision to change is not a bad result; the worst result is a failure to change.
- Multiple measures and repeat measures build confidence that the impact is real.
- Impact measures are only as good as the counter factual used to produce them.
- Impact differs across groups and areas. If feasible, disaggregate results (see above).
- Impact measures must focus on outcomes and groups used to establish need in #1.
So … Do Major Interventions Perform Credibly?
Strong commitment is needed from ministers, boards and managers to back the evidence and sustain the outcomes focus of major policies and programmes. This is particularly true when results do not sit well with prevailing opinions, activities or budgets.
Credible performance stories can be told if major interventions worked as planned. The results chain lays out what was expected from the programme (see #2). Quantitative and qualitative measures – while not comprehensive or conclusive - should reveal whether major expectations are being met.
Credible performance stories mirror the shape of this paper, ie:
- Indicators will show the intervention addresses needs that remain relevant today
- Major results sought, goals and measures were clear, and are being reported on
- Managers have been delivering major outputs efficiently, while maintaining quality
- People or institutions with the need got the vast majority of the output produced
- And most importantly of all, outcomes improved as predicted in areas of need
Good leaders strive to protect what works, while improving or changing what doesn’t. Good leaders always look for new ways to achieve their mission: ‘improve outcomes’.
Parts of a ‘good news’ story from transport are presented in #7: ‘A strategy that works!’ Good news stories typically focus on outcomes, and efficient production and coverage.
‘Mixed news’ stories are more valuable, as they show where improvement is needed. Appendix 1 illustrates some performance stories that mix ‘good news’ and ‘bad news’.
Leaders seldom take issue with positive results or positive reviews. But optimism bias or distorted reporting can make ministers, boards and managers conclude things are going well when stark evidence exists to the contrary. Warnings of problems include:
- sudden silence about measurement approaches (eg, that had been hyped)
- burying bad news (sometimes while grandstanding on selected, positive results)
- lengthy debate on the merits of ‘bad’ measures (without action to improve them)
- sudden termination of measurement systems, or personal attacks on ‘measurers’
- regressive focus on output measures, without demonstrating results for citizens.
When results are not as expected, it is crucial to look first at outcomes and ask why. If efficiency, delivery or coverage is an issue, perhaps the intervention can be improved. Assess risk, and then decide whether to redesign, downsize, replace or stop the intervention. Too often, bad news breeds indecision (perhaps masked as ‘review’).
One hallmark of great leaders is their use of contrary information to create the sense of urgency needed to improve public services. Think about it. Good news just encourages staff to preserve what they do, or do even more of the same. Bad news takes us all out of our comfort zone, and can be used to drive management, innovation and progress.
Appendix 1: What Could a Credible ‘Performance Story’ Look Like?
Appendices 1 and 2 present complementary perspectives on what could and should be reported to demonstrate the performance. Neither perspective is prescriptive. What it is sensible to report about major interventions, and what can be measured, will vary. But the absence of performance information will raise questions about why leaders place high reliance on the major intervention as part of their strategy and intervention mix.
|Regulation / Taxation / Inspection||Rehabilitative
|Targets reduced death and debility from larger cause of ill health; we know who is most at risk (because?)||Targets selected persons or agents least prone to voluntary compliance; incident rates known||Targets individuals with > $10,000 future cost if left untreated; clear rehab goal|
and Cost Effective
|Delivered at same or lower real cost vs. prior years; no output or mix of outputs was available with higher VfM. Emerging technology will soon allow us to …||Real compliance costs kept low; real enforcement costs kept low; total costs||Delivered at same or lower real cost vs. prior years; high cost-effectiveness modelled (assumptions supported by impact measures); no more cost-effective output known|
|Q and Q were maintained; high patient satisfaction; low 90 day mortality rate vs. prior years/other DHB||Investigations as contract; similar or lower complaint rates vs. prior years; high risk persons aware of higher risk of detection||Delivery met specification 95% of time; 80% of starters complete programme; core messages retained by 80% completers after 30 days|
|Coverage||90% of treatments went to people most at risk and/or experiencing most debility; waiting times for crucial treatment(s) same or lower||Detection rate of unwanted events falling; this may be due to (a) poor targeting or (b) improved deterrence. Impact measures suggest (b) not cause; reviewing (a)||95% spaces on course filled with people meeting or exceeding entry criteria, and likely to respond to rehab; targeting model validated (or being improved by …)|
|Impact||Using evidence-based ‘best’ practices endorsed by NZGG. Indicators show reduced incidence in target groups; high risk-adjusted survival rates for patients||Incident rate in high risk target group is unchanged, but would deteriorate without attention (testing by reducing volume in area X). (Alternate approach being piloted in Y to test impact)||Z% fewer bad outcomes vs. comparison groups; most CBRs as predicted; some rehab programmes under-performing - will modify / adjust programme mix|
CBR: Cost Benefit Ratio. Q and Q: Quantity and Quality. DHB: District Health Boards. NZGG: New Zealand Guideline Group
- Intensive Rehab Programmes Are Effective (Must Adjust Mix)
Appendix 2: Is the Right Information Being Reported?
Following advice in this paper should result in a tight ‘basket of measures’ conforming to the United Kingdom’s FABRIC criteria (modified to reflect the ER’s focus on major interventions). The FABRIC criteria can be used to confirm that the set of performance measures being reviewed is part of a credible performance management system:
- Focussed on the intervention’s (and agency’s) main aims and objectives
- Appropriate to, and useful for, stakeholders likely to use the performance information
- Balanced, giving a picture of what the intervention is achieving in significant areas
- Robust in order to withstand institutional change and challenge
- Integrated into the agency’s decision making and business planning processes
- Cost effective, balancing the benefits of improved information against its costs.
Moving from system attributes to qualities of individual measures, FABRIC has another set of principles that can be used to identify (and perhaps eliminate) weaker measures:
- Be relevant: to what the organisation is trying to achieve through the intervention
- Avoid creating perverse incentives: so as not to encourage wasteful behaviour
- Be attributable: to the agency’s actions, so it should be clear where accountability lies
- Be well-defined: with an unambiguous definition, so data is collected consistently, similar measures get compared, and the measure is easy to understand and use
- Be timely: so progress can be tracked quickly enough for the data still to be useful
- Be reliable: accurate enough for the intended use, and responsive to change
- Be comparable: with past periods, similar programmes, or a valid reference group
- Be verifiable: with clear documentation, so measures can be validated.
Initial reviews of major interventions may rely on available data. Remember that perfect measures are rare. Measures that perform badly against the FABRIC criteria could be dropped. But this depends on whether a big gap is left in your information. ‘It is better to be roughly right, than to be perfectly ignorant.’ Partial information is better than none.
Source: UK Audit Office
The FABRIC criteria are also useful in reviewing future reporting requirements. Major gaps in information about the performance of major intervention may make a strong statement about how well the agency has managed its affairs. The deciding factor is whether the information sought can be produced at reasonable cost. When information could have been produced (but was not), and would have helped the team to develop a clearer view on the management or performance of the intervention, you are expected to propose its inclusion in future reporting requirements for consideration by Ministers.
- Choosing the right FABRIC’, http://www.hm-treasury.gov.uk/media/EDE/5E/229.pdf
Appendix 3: Comparison Group Choices for Assessing Impact
Appropriate comparison groups are needed to attribute changes in outcome measures to a major output (or group of outputs). No one comparison group choice is ‘best’ in all situations. Ethical and delivery obligations limit how services get allocated, and may therefore influence your choice of how comparison groups are set up.
Good comparison groups must be similar to the treatment groups in all respects, except with respect to the intervention being tested for effectiveness.
Appendix 4: Additional Resources
The following resources provide useful performance indicators, discuss measurement approaches, and provide useful ways of analysing performance and presenting results:
Australia, Productivity Commission: efficiency, equity and effectiveness measures for major Government services (http://www.pc.gov.au/gsp/reports/rogs/2006/index.html)
Canada, Alberta, Measuring Up – financial and outcome reporting on major objectives
NZ, Pathfinder: principles for identifying relevant outcome indicator and impact measures (http://io.ssc.govt.nz/pathfinder/information.asp, Building Blocks 1-5 and Lessons Learnt)
NZ, Ministry of Health, 2001: Evidence-based health objectives for the New Zealand Health Strategy
NZ, State Services Commission: Performance Measurement: Advice and examples on how to develop effective frameworks (http://www.ssc.govt.nz/performance-measurement)
NZ Treasury: The Strategy Primer lists measurement and governance considerations
UK, Treasury: PSA Public Performance Reporting – measures, targets and interpretation (http://www.hm-treasury.gov.uk/documents/public_spending_and_services/publicservice_performance/pss_perf_index.cfm)
UN World Food Programme: Logic model-based monitoring and evaluation guidelines (http://documents.wfp.org/stellent/groups/public/documents/ko/mekb_module_7.pdf)
USA, federal departments, performance reports: output and effectiveness measures with time series comparisons (eg, http://www.dot.gov/perfacc2005/toc.htm)
USA, making managing for results manageable (http://www.resultsaccountability.com/)