In this case study, we outline our implementation of LLM's to automate the classification of unstructured market research data and customer feedback, aiming to improve accuracy and reduce operational costs for our client.
-
Market Research
Natural Language Processing
Natural Language Processing
Challenge
Our client collects a large amount of market research data and customer feedback. In many cases, this involves unstructured text data.
The status quo solution was a mix of manual classification from humans and using a special purpose classification APIs (expensive).
Goals
The goal was to implement an API to do feedback classification based on large language models with human level accuracy, multilingual, GDPR respecting, and low operational costs.
Details
Human level accuracy with no training data for a big variety of different topics (no special purpose solution)
Multi-language support (automatically)
Data privacy (avoid leaking personal data to external providers)
Low operational cost per categorization than the current solution.
Solution
We implemented OpenAI’s GPT-3.5-turbo model due to its low price and high quality. The model is on a level that it doesn’t require training data (and zero-shot-learner) and comes with multi-language support out-of-the-box. Challenges have been removing personal data and steering the model to understand more complex classes.
The API was provided as a docker container and comprehensive telemetry to guarantee the performance and maintainability of the application.
Challenge #1 Steering the model
The model was not always following the instructions. The prompt initially contained a list of classes + explanations, for example:
“pricing”: complains about high pricing regardless of other issues.
“voucher_not_accepted”: if a voucher wasn’t accepted.
The following issues occurred:
1. GPT-3.5 sometimes ignores the explanations and uses only the class name e.g. “pricing” for any type of pricing issue, but not necessarily high pricing. Solution: Use descriptive class names instead of internal class IDs:
“high_pricing”: complains about high pricing regardless of other issues.
“voucher_not_accepted”: if a voucher wasn’t accepted.
2. GPT-3.5 has issues with overlapping concepts.
For example a feedback could be: “In my last order with a voucher, I had missing items in the delivery”.
In some cases, GPT-3.5 classifies it as “voucher_not_accepted”. Although the feedback is related to a voucher, it’s not the actual issue. Better explanations don’t reliably help.
Solution: We provide an extra placeholder category that allows to classify that feedback as voucher related, but internally we don’t count it as an issue.
"voucher_used": if a coupon was fully or partially used.
“voucher_not_accepted”: if a voucher wasn’t accepted.
“high_pricing”: complains about high pricing regardless of other issues.
Instead of the wrong class voucher_not_accepted, the model now picks voucher_used in these kinds of scenarios. We now had to only ignore these placeholder classes.
Demo
Challenge #2 Data privacy / GDPR
We needed to avoid leaking any personal data to OpenAI or other APIs. For this particular use case, we decided to use the technology called Named-Entity Recognition.
We implemented an open source library to detect names, IDs, addresses etc. and either removed the data or replaced (!) them with synthetic data that won't harm the accuracy of the large language model.
In a second step, we benchmarked the solution to verify the effectiveness of the solution and produce documentation that helps with GDPR related compliance.
Solution
We implemented OpenAI’s GPT-3.5-turbo model due to its low price and high quality. The model is on a level that it doesn’t require training data (and zero-shot-learner) and comes with multi-language support out-of-the-box. Challenges have been removing personal data and steering the model to understand more complex classes.
The API was provided as a docker container and comprehensive telemetry to guarantee the performance and maintainability of the application.
Challenge #1 Steering the model
The model was not always following the instructions. The prompt initially contained a list of classes + explanations, for example:
“pricing”: complains about high pricing regardless of other issues.
“voucher_not_accepted”: if a voucher wasn’t accepted.
The following issues occurred:
1. GPT-3.5 sometimes ignores the explanations and uses only the class name e.g. “pricing” for any type of pricing issue, but not necessarily high pricing. Solution: Use descriptive class names instead of internal class IDs:
“high_pricing”: complains about high pricing regardless of other issues.
“voucher_not_accepted”: if a voucher wasn’t accepted.
2. GPT-3.5 has issues with overlapping concepts.
For example a feedback could be: “In my last order with a voucher, I had missing items in the delivery”.
In some cases, GPT-3.5 classifies it as “voucher_not_accepted”. Although the feedback is related to a voucher, it’s not the actual issue. Better explanations don’t reliably help.
Solution: We provide an extra placeholder category that allows to classify that feedback as voucher related, but internally we don’t count it as an issue.
"voucher_used": if a coupon was fully or partially used.
“voucher_not_accepted”: if a voucher wasn’t accepted.
“high_pricing”: complains about high pricing regardless of other issues.
Instead of the wrong class voucher_not_accepted, the model now picks voucher_used in these kinds of scenarios. We now had to only ignore these placeholder classes.
Demo
Challenge #2 Data privacy / GDPR
We needed to avoid leaking any personal data to OpenAI or other APIs. For this particular use case, we decided to use the technology called Named-Entity Recognition.
We implemented an open source library to detect names, IDs, addresses etc. and either removed the data or replaced (!) them with synthetic data that won't harm the accuracy of the large language model.
In a second step, we benchmarked the solution to verify the effectiveness of the solution and produce documentation that helps with GDPR related compliance.
Outcomes
In terms of quality and costs, the provided solution outperformed humans and their previous used API: