Labsbit.ai | Case Studies - Feedback Classification With GPT-3.5-Turbo

Our client collects a large amount of market research data and customer feedback. In many cases, this involves unstructured text data.

The status quo solution was a mix of manual classification from humans and using a special purpose classification APIs (expensive).

Goals

The goal was to implement an API to do feedback classification based on large language models with human level accuracy, multilingual, GDPR respecting, and low operational costs.

Details

Human level accuracy with no training data for a big variety of different topics (no special purpose solution)
Multi-language support (automatically)
Data privacy (avoid leaking personal data to external providers)
Low operational cost per categorization than the current solution.

We implemented OpenAI’s GPT-3.5-turbo model due to its low price and high quality. The model is on a level that it doesn’t require training data (and zero-shot-learner) and comes with multi-language support out-of-the-box. Challenges have been removing personal data and steering the model to understand more complex classes.

The API was provided as a docker container and comprehensive telemetry to guarantee the performance and maintainability of the application.

‍Challenge #1 Steering the model

The model was not always following the instructions. The prompt initially contained a list of classes + explanations, for example:

“pricing”: complains about high pricing regardless of other issues.
“voucher_not_accepted”: if a voucher wasn’t accepted.

The following issues occurred:

‍1. GPT-3.5 sometimes ignores the explanations and uses only the class name e.g. “pricing” for any type of pricing issue, but not necessarily high pricing. Solution: Use descriptive class names instead of internal class IDs:

“high_pricing”: complains about high pricing regardless of other issues.
“voucher_not_accepted”: if a voucher wasn’t accepted.

2. GPT-3.5 has issues with overlapping concepts.

For example a feedback could be: “In my last order with a voucher, I had missing items in the delivery”.

In some cases, GPT-3.5 classifies it as “voucher_not_accepted”. Although the feedback is related to a voucher, it’s not the actual issue. Better explanations don’t reliably help.

Solution: We provide an extra placeholder category that allows to classify that feedback as voucher related, but internally we don’t count it as an issue.

"voucher_used": if a coupon was fully or partially used.
“voucher_not_accepted”: if a voucher wasn’t accepted.
“high_pricing”: complains about high pricing regardless of other issues.

Instead of the wrong class voucher_not_accepted, the model now picks voucher_used in these kinds of scenarios. We now had to only ignore these placeholder classes.

Demo

*Fig 1: To test the API solution, we provide the client with a simple user interface.*‍

Challenge #2 Data privacy / GDPR

We needed to avoid leaking any personal data to OpenAI or other APIs. For this particular use case, we decided to use the technology called Named-Entity Recognition.

We implemented an open source library to detect names, IDs, addresses etc. and either removed the data or replaced (!) them with synthetic data that won't harm the accuracy of the large language model.

In a second step, we benchmarked the solution to verify the effectiveness of the solution and produce documentation that helps with GDPR related compliance.