Ensuring the Quality, Reliability and Ethics of AI-Powered Applications at Scale

 

Author: Chris Sheehan, EVP High Tech and AI, Applause

Reviewed by: Vaishnavi Nashte

Generative AI (Gen AI) has taken the world by storm. New Gen AI services designed to meet the requirements of almost every use case and profession imaginable are appearing all the time. Analysts estimate the market value will reach around $52.2B by 2028. However, to fully the benefits of Gen AI, organisations should implement the key elements of Responsible AI. Without adherence to Responsible AI, organisations run a high degree of risk across inaccuracy, safety, compliance, security and fairness.  

In order to effectively mitigate these risks, organisations should develop a Responsible AI testing and evaluation process that generates trust internally and externally. Human-based testing is an essential part of this process and should be implemented at various stages of development, as well as after the application is released. Including wide-ranging, diverse perspectives is critical to reliable, inclusive output, but it’s a significant challenge to apply at scale.

The wisdom of crowds

Many businesses have attempted to train their Gen AI systems in-house, using only their employees to add the “human touch.” But even those with large workforces are unable to produce datasets that are large and diverse enough to meet the needs of their product teams. This is why organisations are turning to crowdtesting solutions that give them access to a global community of independent testers who can provide the levels of diversity and scrutiny required to test and validate their Gen AI systems.

These experts and end users provide the necessary input and feedback to navigate complex ethical issues and fine-tune Gen AI responses to meet human expectations on a scale that would be impossible to achieve in-house. The concept of incorporating crowd-sourced insights ahead of the release of a Gen AI application can uncover issues and opportunities for improvement that development and QA teams wouldn’t necessarily think of themselves.

Harnessing data and human feedback  

High-quality training and testing data from experts and end users is often needed to power the large language models (LLMs) that underpin Gen AI services. LLMs generally require an incredible amount of data and information to make accurate predictions. Different datasets serve different purposes. Take financial services for example – preparing an LLM or algorithm to make predictions means collecting lots of quality data from experts within the financial services sector. As businesses increasingly trust AI in their decision-making processes, it’s even more important to make sure the dataset is accurate, large and diverse enough to reduce any biases that can lead to harmful conclusions.

Another layer of testing and validation can be achieved using a process known as red teaming, an adversarial technique designed to find points of failure in Gen AI. Red teaming involves a team of experts who execute a series of tests to identify problems related to security, safety, accuracy, functionality or performance. Red teaming can unearth points of failure that would be difficult to identify through regular testing methods.

 

Gen AI inclusivity and accessibility

Building ethical Gen AI systems requires proactive testing practices to uncover biases and mitigate risks. Risk reduction can be achieved by employing diverse testing practices, like recruiting testers from a range of demographic, geographic and psychographic backgrounds, including different skin tones, genders, ages, languages and more. For example, facial recognition systems trained on non-representative datasets can produce discriminatory results. Diverse testing is a strategy that can help mitigate various risks, although the technology will always have some level of risk associated with it.

Another consideration is accessibility – digital products, including Gen AI tools, can’t be considered inclusive if they are not accessible. Companies should optimise their applications for people with disabilities (PWD) through accessibility testing and inclusive design. Organisations can provide genuine value for users with both permanent and temporary disabilities by including PWD in testing and research. This approach naturally helps organisations deliver more reliable and inclusive AI systems.

The value of applying rigorous testing and training techniques involving expert testers and end users at scale is clear. Organisations would prefer to identify and resolve potential Gen AI safety and ethical issues during testing, before they occur in the real world. Incorporating elements of Responsible AI throughout development can help avoid unintended harms. By involving a large number of participants from a range of backgrounds and locations, who are willing to test and provide input early and often, organisations can achieve the level of diversity required for less biased, more effective results.

Privacy Overview
Software Testing News

When you visit any website, it may store or retrieve information on your browser, mostly in the form of cookies. This information might be about you, your preferences or your device and is mostly used to make the site work as you expect it to. The information does not usually directly identify you, but it can give you a more personalized web experience. Because we respect your right to privacy, you can choose not to allow some types of cookies. Click on the different category headings to find out more and change our default settings. However, blocking some types of cookies may impact your experience of the site and the services we are able to offer.

Strictly Necessary Cookies

Strictly Necessary Cookie should be enabled at all times so that we can save your preferences for cookie settings.

Performance Cookies

These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and therefore anonymous. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance.

Targeting Cookies

These cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.