Scalr.ai

5 Best Use Cases For Product Matching In Ecommerce & How You Can Implement Each One

Matt Payne
·
August 24, 2021

Let's take a look at what product matching is and 5 different ways you can use product matching software in your ecommerce business to generate positive business roi.

What is product matching in ecommerce?

Product matching is the process of leveraging machine learning and different data sources to match products based on similarity. In most cases, this comparison is between our own products vs our competitors, but large retailers like Walmart use this algorithm to look at already existing products in their store vs a new product a seller is trying to list. In the past retailers used attribute information such as SKUs, titles, GTIN, and other data points to make comparisons between two products. As you can imagine, this is not an efficient and accurate way to compare products at a large scale of products, or against all competitors on the market.

Two different adidas jackets

These two jackets would be nearly impossible to compare with the attributes available even though they are the same product.

  1. Don't offer the same colors which can throw off algorithms taking that into account
  2. Titles are too different to compare based on words used or substrings inside the title.
  3. Very little information provided in the description
  4. Prices listed are not close to each other. Price comparison is often used with comparing the similarity

As we'll see throughout this article, product matching is an extensive topic in retail and ecommerce that covers many different use cases that produce roi.

Machine Learning Approach To Product Matching

Modern product matching uses many different features and machine learning algorithms to compare the similarity of products. The wide range of available similarity algorithms we can use allows us to build our comparison tools based on any level of available product data. These points below are common components of product matching today.

Product Title Similarity

Using deep learning algorithms such as spaCy and GPT-3, Scalr.ai builds a title similarity module that learns to better understand contextually similar titles even when the compared title strings are very different. Here's an example of 4 titles from the same exact product:

Garmin nuvi 2699LMTHD GPS Device

nuvi 2699LMTHD Automobile Portable GPS Navigator

Garmin NUVI 2699LMTHD — GPS navigator — automotive 6.1 in

Garmin (nuvi) 2699LMT HD 6" GPS with Lifetime Maps and HD Traffic (010–01188–00)

The same products can have very different looking product titles which will make matching difficult when using exact string comparisons or even similarity tools like damerau–levenshtein.

Price Comparison

Price comparison is one of the features we can use when matching products in a larger algorithm. We mostly use two data analysis algorithms to help us map price similarity in our matching:

  1. Price outlier detection - Outliers can help us figure out similarities when comparing a single price to a larger group of similar priced items that are otherwise matched.
  2. Clustering - Clustering algorithms such as K-means allow us to understand the size of a similar products market based on pricing and can use information learned as a feature in our overarching product matching system.
clustering example

Image Similarity

Image similarity is one of the most powerful and important deep learning algorithms we can use to find the similarity between two products. As we'll see in the use cases there are many different tasks image similarity can be used for when matching ecommerce and retail products. Scalr.ai's module for images can learn the similarity between products no matter the angle, image quality, design size, or background.

product comparison with white womens dresses
Compare the top product image to similar products and rank them closest (left) to least

We've built this high powered solution using the most up to date image recognition architectures and fine tune it for each specific business use case. This fine tuning allows us to tailor the results for the ecom brand and produce model accuracy results that smash prebuilt solutions.

Deep Learning Attribute Extraction

Product attributes like brand, size, condition, model number, colors available, description, and more can still be used as effective data points to match our products. We can split our attributes into two different categories:

  1. Limited Range Values: Product attributes with a fixed range of available values such as colors, clothing sizes (small, medium, large). Using the data science term "one hot encoded vectors" we can transform these data points to a format that works with ai.
  2. Endless Values: An attribute fits into this category if its possible values do not have a fixed range naturally. While these data points can theoretically have unlimited available values, the accuracy of our models takes a hit as the range grows.


We use custom built neural networks to learn the relationship between product similarity and two products attributes.

UGC Analysis

UGC analysis allows us to use product reviews as a module in our product matching solution. We build a custom gpt-3 based tool that learns key talking points and keywords in the reviews left for a specific product. This information can help us learn more about the similarity in how two products are percieved then how they are presented for sale. This comes in handy when the use case of product matching turns more customer focused instead of strictly presentation.


Use Cases For Product Matching In Ecommerce The Produce Business Value

cornell product similarity matching
Product Similarity Matching Using CNNs

Product Matching To Automate Copyright Infringement Search

Large retailers and ecom brands spend 1,000s of hours and deploy entire teams to scour the internet looking for brands using their designs, logos, or products as their own and selling them. We know a few brands by name that are doing this search manually every day and have made it a priority to automate this search process. Often these designs will not be visually identical to our stores and can have completely different titles and product information making it difficult to use old school google search methods.

starry night product matching deep learning
Imagine we owned Starry Night. We can use our search to find product photos (right) that use our design regardless the backgrounds or size of our design in the photo

What components do we use?

The focus point for our product matching system for identifying copyright stikes starts with an image similarity model and uses title similarity as a reinforcement of the results. This multiple heads approach allows us to rely mostly on the image we find, but use titles as an input to adjust our output similarity result. The main benefit to focusing on images when matching for copyright strikes is it allows us to look for designs that are similar to ours but the competitor has changed everything else about the product. Often stolen designs will be changed slightly in appearance but greatly changed in the information presented, to hide the action. Here's a breakdown of the two components:

Scalr.ai Design Similarity Model

Scalr.ai has built a custom image similarity model for copyright strikes that focuses on the model's understanding of product design and graphics. Our model can be pulled apart and customized to any specific use case or industry. The best part about the ability of our model is you are not required to include your product images in the training data each time, allowing you to quickly run new images through without retraining the entire model.

Siamese network architecture
Siamese Network used to learn similarity

Our model has learned how to identify what matters in a product image, not what the product actually is. This improves generalization and greatly outperforms out of the box options.

shoe heatmap
Image similarity models learn the important features of our product, seen here with a shoe.

Title Similarity

GPT-3 or SpaCy based similarity models are our go to models to learn the relationship between sentence structure and word placement. The key knowledge we're trying to gain using this component here is a standardized way to decide strike or not when the image is close. Adding this component boosts our overall accuracy by quite a bit and removes edge cases causing false positives.

Price Intelligence For Automated Competitive Pricing

Price intelligence allows us to understand how competitors are pricing similar or competing products to ours and track how they adjust the prices over time. These price insights are incredibly valuable as most customers compare prices across multiple competitors before making a decision. This tool allows you to automatically stay price competitive and easily boost revenue by 9.3%.

The Challenge Faced Today

Too often today this task of price intelligence is done poorly and doesn't allow for an efficient and effective process.

  • Manual solutions that require humans to research and track competitor products, as well as manually extract and enter data. You can imagine how slow and inaccurate this can be especially as you scale up your own products
  • Poorly constructed automation such as website scraping technology that lacks the ability to be standardized across multiple competitor websites which leads to the inability to scale to large volumes


Standardize The Process & Gather Real Pricing Insights

Our system here produces product matches based on all the components laid out in our initial section and focuses on understanding the relationship between a retailers item attributes to form a group of matches. With this data aggregated together, we provide powerful insights that allow you to optimize your own products pricing in real time. Over time this price optimization raises revenue by increasing customers as they go and compare your price to others.

transformer based model

Text models such as GPT-3, BERT, SpaCy, etc let us analyze titles, descriptions, product categories, and much more as the matching works to understand how the competitor product is being positioned.

Our custom image model uses popular architectures such as ResNet, Siamese Networks, and Keras to learn what our regions of interest are in a product image. These include designs, colors, product type, logos etc and are the backbone of how we scale our search for competitors to millions.

burger king
We can search for logos or product designs no matter the background, color, angle, or size of what we want to match

Optimize Your Product Listing Based On Automated Competitor Research

Use deep learning based match algorithms to discover information gaps in your listings causing you to lose potential customers to competing companies. This software system analyzes competing listings from matched stores and through training data learns information in descriptions, titles, upc codes, google analytics, and other identifiers that will lead to higher user conversion rates for you.

Compare our product image (top) to others and rank them in order from most similar (left) to least

Analyze your listing description vs other brands

Once we've gathered our competing products we use multiple ai tools to analyze the different sections of a listing. We start with our GPT-3 based model to digest and make sense of the product listing information. The system not only extracts key talking points and keywords used across successful listings but understands language norms and sentence structure to compare our description to ours. Our GPT-3 solution is tailored just to your businesses use case which will always produce better accuracy and satisfaction than out of the box options.

face scrubber listings
Analyze key information in competitor listings to learn what leads to most customers

Product Title Intelligence

Titles can be used to extract important keywords and copywriting knowledge in the same way we do using GPT-3. Identifying gaps in your titles where other websites have figured out what information to include to see more "product xyz sold" emails come through is one of the most important conversion based optimizations you can make. The best part being we can eliminate the manual guessing game and use raw market data understanding to do that.

Title information extraction using gpt-3 straight from our system. Small preview of the information we extract when analyzing our title vs the market as a whole. "Text:" is the input the two lines below are generated.

Listing Attributes Taxonomy

A challenge faced when filling listing gaps is understanding how many colors, SKUs, categories, etc we need to increase customer conversions. Neural network driven learning allows us to track and identify what attributes are must-haves for a product market we are trying to dominate. Once our model learns the relationship between important attribute identifiers and market leaders in the exact product space we can optimize our own listing based on what gaps to fill.

BERT example for classification

Market Data Collection For Recommendation Systems

We build a ton of recommendation system solutions for ecommerce and retail, and collecting valuable data to use for training is always a hurdle we must account for. When we want to build recommendation system solutions around on site recommendations we can use competitor products and their recommended products as training data for our own website.

Amazon recommendation system based on product similarity

Automated Data Quality Processes

Not only are manual data extractions slow and a waste of human resources, they also lead to more data quality issues and mistakes when following a standard data format. Product data requires way more accuracy & standardization than general market or customer data given the wide variety of sources and features. No matter the use case, this cannot be completed at scale without automated data quality processes. Anytime we want to match products from various stores to ours we grab data and there must be a process to extract, clean, parse, and format the data to pass into our match system.

The data quality module allows the retail store or online brand to easily plug the custom piece into any product match use case and begin cleaning and standardizing the powerful data being used with NLP patterns, attribute standardization, feature parsing, and many more data science pipelines that automate your data collection methods.

gpt-3 fine tuning for matching
GPT-3 training files must follow this format. If we have raw unstructured feature we must clean and format our input data

Still Interested In Using Ai To Enhance Your Brand?

Scalr.ai builds custom ai software solutions that deliver clear cut roi to your business and give you a new competitive edge in your market. Ecom is slowly moving towards using ai to gain an edge and we've built and used all the models needed to put huge increases in AOV, LTV, and revenue right in your lap. Let's talk today -> www.scalr.ai/contact