In an expanding range of tech, AI, and advanced manufacturing industries, there are growing issues and concerns for assuring:
- Product safety and safety-related quality
- User/patient safety
- Employee/contractor safety
As well as the wellbeing of other stakeholders
2. BACKDROP
• In an expanding range of tech, AI, and advanced
manufacturing industries, there are growing issues and
concerns for assuring:
• Product safety and safety-related quality
• User/patient safety
• Employee/contractor safety
• As well as the wellbeing of other stakeholders
• For individual businesses, safety and related quality
concerns pertain to products, services, workflows and the
company overall, both present and future
• With the growth of complexity generally, safety and quality
are becoming more challenging to achieve and maintain over
time
3. BACKDROP
• Safety and quality are moving to the forefront of many
companies’ risk registers, even among tech firms which in
the past may have had fewer such concerns
• One leading reason among many: The rapid and widespread
ascent of AI in products and services, as well as AI’s
incorporation into enterprise workflows, raising the profile of
safety and safety-related quality
• Add to this the present news cycle around Boeing’s product
safety concerns, high profile self-driving vehicle issues,
youth harm from social media, medical device recalls, and
lapses in some generative and analytical/diagnostic AIs
• Safety and quality are on many peoples’ minds these days
• A lot of scale-up stage tech company leaders are looking for
help
4. MAKING THE CASE
FOR CHANGE
• There are usually three main ways to build the case for improving
safety and quality, apart from event-driven workplace, regulatory,
insurance, social license or media coverage/brand reasons:
1. Develop outside-in views of what is attainable in quality and safety,
and then benchmark internal performance
2. Conduct focus groups among staff and management about what is
causing any shortcomings, underexamined risks, and why. Extend
the focus groups as necessary to customers, users, suppliers and
other relevant stakeholders
3. Create an actionable strategic plan for improving quality and safety
• Give people a roadmap for how to get to a better place together so
they can see how to contribute to the upgrade or fix with a believable
path forward
5. LEVEL SET
• Get everybody on the same page about the current state
and recent past, typically using:
• Data and metrics
• Training and awareness building about the consequences
of strengths and shortfalls
• Centralized information hubs for updates and ongoing
access to further data and knowledge
• Emphasizing or reestablishing transparency, especially if
recent history involved opacity or blind spots in safety and
quality matters
6. LEVEL SET
• Be sure that the company’s governance and most senior
leadership is on board and aligned, as gaps there will
often grow as time goes on and expectations build for
results
• Especially if significant improvement is needed, make sure
that the short-term hit to productivity and output associated
with the J-curve typical of reforming a deficient safety and
quality culture is understood and agreed upon
7. PREPARE FIRST AND
SECOND LEVEL
SUPERVISORS
• Leadership at the front lines, typically first and second
level supervisors, becomes critical to affect change
• These people are the ones close enough to the on the
ground mechanics of action to see and change what is
going on with most of the day-to-day work
• If they don’t understand, agree or feel capable of acting
and leading in new ways, it will be very difficult to achieve
lasting change across the workforce
8. PREPARE FIRST AND
SECOND LEVEL
SUPERVISORS
• Provide the tools, training, resources and visible displays
of commitment they need.
• These elements are prerequisites to reforming safety &
quality if significant change is required
• First and second level supervisors will be the ones who
establish new standards of performance with the front line
team, such as:
• Authoring or revising SOPs
• Norms
• Problem solving methods
• Communicating the right way to handle difficult situations,
and,
• How to initiate dialog about doubtful circumstances
9. SEPARATE HUMAN FACTORS
FROM PROCESSES AND
SYSTEMS
• A common distinction to help organize diagnostics and
inferences for action is to differentiate latent from active
failures of safety and safety-related quality:
• Latent failures are generally caused by flaws or omissions
in the organization, its processes, communication channels
and decision mechanisms
• Latent failures create the conditions where active failures
can occur, but are not sufficient to cause them
• Active failures usually involve human factors as the
catalyst, actuating failures
10. VISUALIZATION
• “Swiss Cheese” diagram of error dynamics
• Shows how several latent failures, each individually
insufficient to cause an incident, can contribute to an
adverse event
Image Credit: ABC of Patient Safety, Sandars and Cook, Blackwell
11. NEAR MISSES
• Near misses need to be part of the dialog to make
progress quickly
• Attention to both near misses and ambiguous situations
requiring clarification together signals that the
organization is serious about information exchange,
learning and improvement in safety and quality
• Anything less than open exploration of close calls and
digging into borderline situations will come to be seen as
values and priorities being elsewhere, which ultimately
fosters institutional laxity or blindness to warning signs and
other early indicators of growing risk
12. FALSE ALARMS
• Even false alarms have to be seen as valuable learning
moments, and a way to show that all voices need to be heard
• When people have doubts, they need to feel that they are
welcome to stop the line, figuratively or literally, to sort out
the best way to proceed
• No (more) sending questionable work forward, and hoping for
the best
• No reprisals for people asking questions or expressing doubts
• When staff and management are comfortable raising their
concerns this way, the number of queries may be high, but
the resolution times for the vast majority become very quick
when practiced routinely
13. FALSE ALARMS
• With persistence in quick, routine information exchanges
to resolve most issues, throughput comes back up and
typically will ultimately exceed prior productivity with a
more knowledgeable, adaptive workforce
• This completes the beneficial back end of the productivity
J-curve referenced earlier, when enhancing or restoring
safety and quality cultures
14. GETTING STARTED
• If any major changes among the leadership or staff are
required, do those quickly at the beginning of a safety and
safety-related quality turnaround
• Some leaders or individuals may be so heavily associated
with the values and events which led to the prior weak state
or decline in safety and quality that it is nearly impossible for
stakeholders to truly believe in a turnaround with them still in
place
• Once those few personnel changes have been made, most of
a safety and quality turnaround needs to be done without fear
of retribution or firings among existing staff and management,
in order to have the candor and honesty that it takes to
improve rapidly
• Systems thinking needs to move to the foreground, working
on processes, tools, training, incentives and rewards, and not
one of blaming individuals
15. LEADERSHIP
PRINCIPLES
• Leadership principals need to change
• Safety and quality turnarounds have to start at the top, with
a tangibly new or renewed commitment to leadership by
example for the new ways of working together, setting the
standard and modeling the behaviour expected from the
rest of the organization
• Any sense of a double standard or inconsistency where
safety and quality are concerned between one group within
the business and others will make a turnaround much
harder to achieve
16. REWARD HONESTY
AND OPENNESS
• It is very difficult to make progress if individual, group or
institutional self-deception, blind spots, or taboos persist
• People have to feel that it will not be career threatening to talk
openly about safety, quality and close calls
• Broad participation from everyone is required
• People have to feel engaged and that their views and voices
are important
• There can’t be any residual fears of management backlash,
critical peer judgment or a lack of support for speaking up
• The new normal has to be a form of the US TSA rule: If you
see something, say something
• People need to see their individual responsibilities as more
than fulfilling just the minimum criteria for their jobs
17. FOSTER DIALOG
• Bring in speakers on good practices used externally
• Initiate blameless reporting systems
• Position reporting as a way to uncover and curtail
breakdowns in processes and systems, not to cast blame
• This requires backing away from seeing incidents as
results of individual incompetence
• Include a “Good Catch” portal or even just a simple log for
individuals to record near misses
• Review and rank the risk profile of close calls continually
• Act on the most significant
18. FOSTER DIALOG
• Change the terminology about issue reporting and follow-
up to a more blameless form:
• Investigation -> Study
• Error -> Incident or Event
• Blame -> Accountable
• Judgement -> Learning
• etc.
19. STEERING
COMMITTEE
• This working group is to ensure that all parts and levels of
the organization are heard in ongoing steering decisions
• Typically, have a C-level officer chair the committee
• Incorporate representatives from all major functions,
operating sites, business units, and multiple levels to
avoid information gaps or filtering
• The committee also is to actively counter any deficiencies
in inter-group communication or misunderstandings
about safety and quality technology, tools, policies and
procedures
20. CREATE A PLAYBOOK
FOR HOW TO CONDUCT
EVENT STUDIES
• Create a process and tools for a standardized way to
study events of concern
• This protocol should be applicable to both specific
incidents, as well as near misses
• Start with the timeline of events
• Make room for multiple contributory factors, since
respondents may only have seen or known one small part
in the larger picture of confluences
• Wishbone or Ishikawa diagram kinds of constructs can be
useful mental and visualization models for multi-factor
incidents
21. EVENT STUDY –
GETTING UNDERWAY
• Initially, a standardized toolkit and approach for event
studies may need to be pushed into regular usage
• Usually this push starts top-down, but it can also come
from a motivated constituency elsewhere in the
organization as long as that group has strong, visible
backing from leadership to pilot, tune, and then spread the
practice widely
22. EVENT STUDY -
TRACTION SIGNAL
• With time, as a safety and quality transformation takes
hold, the event study protocol will often start to be used
spontaneously, even by self-organized event study teams,
as part of routine practice when people see anomalies that
need to be acted upon
• This signal of pull for the event study protocol shows that
positive momentum to improve is building
23. ACTION PLANS
• Map complicated processes, break them down into simple elements,
and spot the lapses in reliability by which to drive improvement
• A lot like FMEA methods
• The benefit of this sort of cause and effect cascade discipline is that
it moves the thinking from the purely reactive, to more proactive view
of what could likely cause deficient safety, not just what has caused
issue
• This kind of workflow blueprinting is often done after event study
through a sequence of planning, workshopping and training
• Then, typically, use PDCA or a similar rapid iteration methodology to
implement changes, and make further adaptations toward the
ultimate desired state
24. VISIBILITY, INCENTIVES
& ALIGNMENT
• Make the safety and quality improvement work visible
• Set concrete goals, and regularly communicate about
progress
• Recognize and reward advancement
• Include or increase monetary incentives for safety and
quality
• Equally, sanction and otherwise act on any backsliding or
reversion to older, problematic ways of behaving
• Periodically do pull-back reviews to track progress,
identify remaining gaps, and make plans for further course
corrections
25. ONBOARDING, TRAINING
& DEVELOPMENT
• In a turnaround situation, onboarding, training and
development all need to shift to put much more emphasis
on safety and quality practices
• Onboarding is especially important in high turnover, high
growth or seasonally staffed businesses
• Basic elements of onboarding, training & development
include tools and techniques for the new ways of working
• More advanced: Getting new team members into the right
mindset
26. ONBOARDING, TRAINING
& DEVELOPMENT
• Context about the appropriate frame of reference to have
should come from recent case studies of things done well,
as well as some perspective about what exemplified
insufficient or bad practice in the past
• To help everyone get up the learning curve quickly,
provide opportunities for learning through doing,
explaining and teaching
• Don’t just push a lot of content at new arrivers and hope
for the best
• Alignment, comprehension and retention are sharply better
when new teachings are immediately exercised in different
ways
27. CONTINUOUS
IMPROVEMENT
• If there is not a continuous improvement (CI) system in
place already, implement at least a basic one to keep
safety and quality matters moving ahead as the company,
environment and product/service continue to evolve
• The goal needs to be to get things right the first time; auditing
and downstream checks of quality and safety should mostly
be confirmatory, rather than the first line of defence against
shortfalls
• The three starting points for a bare bones CI program,
borrowed from lean methodologies, are usually to:
a. Error-proof processes,
b. Standardize work, and,
c. Eliminate waste
28. QUALITATIVE SIGNS OF
CULTURAL PROGRESS
• It is typically a reliable sign that a cultural transformation
for safety and safety-related quality is taking hold when:
• People no longer push back about the need for quality and
safety; instead, they spontaneously ask questions about
how to improve and how to prevent near misses
• Quality and safety come to be seen as a vehicle for
organizational success, rather than a source of friction,
supporting and driving improvements in other areas of the
business
29. EXAMPLE – AI SAFETY
TYPICAL PRACTICES
• Robust Testing and Validation
• Throughout the development and design maintenance
process, including edge cases, and unconventional inputs, as
well as known errors and exceptions
• Ethical design principles
• Ex: Fairness, transparency, accountability, and interpretability
• Ongoing monitoring and feedback to detect anomalies or
unexpected behaviours, and improve performance
• Watch out especially for creeping or more sudden expansion
of the operational design domain (ODD), or other unforeseen
usage scenario dilation
• Human in the loop or on the loop
• Provide for human monitoring or intervention
30. EXAMPLE – AI SAFETY
TYPICAL PRACTICES
• Explainability, to end users and customers, and not just to
other computer scientists or ML engineers
• Data quality and governance, to ensure training data is
accurate, representative and bias free
• Regulatory compliance, such as for safety, privacy,
security and ethical use
• Cross-functional collaboration, spanning data scientists,
engineers, domain experts, ethicists, legal advisors and
end users
• Continuous improvement
• For ongoing evolutionary learning and adaptation
31. EXAMPLE – AI SAFETY
• Each of these practices can be seen as one of the layers
of swiss cheese safeguards
• Each layer will prevent many incidents, but not all
32. EXAMPLE – AI SAFETY
• Cultural context, regulation, standards of care, industry
standards, technology, data and personnel all evolve over
time
• The number and size of holes in each layer of swiss cheese
safeguards can change, as well as the length and width of
each or all safeguard slices
• Furthermore, different types of triggers can emerge
• The outcome can be that practices which were sufficient for
safety and quality at one point in time can often become
inadequate in the future
• Sometimes, the drift of safety and quality away from
acceptable levels happens slowly at first, and is hard to
definitively detect, leading to some complacency and delayed
reaction
33. EXAMPLE – AI SAFETY
• Proactive:
• The practices for enhancing or restoring safety and quality
discussed in this seminar can be used proactively keep AI
safety in trim form
• Reactive:
• Should AI safety and quality degrade, and a sharp rebound
required, the techniques discussed in this talk can also be
used reactively to put a program in place to get things back
up to par
34. EXAMPLE – AI SAFETY
• The human element:
• Like all monolithic complex technologies, especially those like
DNNs which have a black box element to them that are never
fully knowable, it can be challenging for people to feel
capable of taking full ownership of improvement initiatives
• As well, changes in personnel, tools, training, as well as
external context mean that there are ongoing vulnerabilities
with human error and changing standards of individual and
team performance that can compromise AI safety and quality
• An ongoing system for improvement of both technical and
human mechanisms for building in and assuring safety and
quality will always be required
35. TAKEAWAYS
• A company’s approach to near misses and borderline case
handling usually says as much its safety and quality
culture as anything else
• Attention to detail is both a cause and an effect of safety
and quality, which can be leveraged advantageously
• Safety incidents often have less scope for debate about
their interpretation vs. other aspects of quality or
thoroughness
• This can make it easier and faster to use safety to get
people aligned and moving toward change with broader
salutary effects through widespread attention to detail for
the organization and its stakeholders
36. TAKEAWAYS
• The tools and methods of lifting safety and quality generally
have a lot of carryover benefit to other parts of the business
once people become practiced with them
• Capacity for change, development, curiosity and learning are
core to improving safety and safety-related quality
• These same attributes extend out widely and advantageously
to other facets of the enterprise, its products and services
• First and second level supervisors have a big role to play
getting the organization to where it needs to be
• Get people with the right capacities in those roles
• Make sure they have what they need
• Audit their concerns and adjust accordingly throughout the
evolutionary journey for safety and quality
37. TAKEAWAYS
• The human element as one of the biggest sources of risk
and opportunity is as true in safety and quality as in many
other realms
• Especially as complexity grows, purely technical, logical or
physical safeguards will never be entirely sufficient on a
lasting basis
• People, their curiosity, aptitude to learn and desire to
improve always have the largest ongoing role to play