Let's take a look at what product matching is and 5 different ways you can use product matching software in your ecommerce business to generate positive business roi.
What is product matching in ecommerce?
Product matching is the process of leveraging machine learning and different data sources to match products based on similarity. In most cases, this comparison is between our own products vs our competitors, but large retailers like Walmart use this algorithm to look at already existing products in their store vs a new product a seller is trying to list. In the past retailers used attribute information such as SKUs, titles, GTIN, and other data points to make comparisons between two products. As you can imagine, this is not an efficient and accurate way to compare products at a large scale of products, or against all competitors on the market.
These two jackets would be nearly impossible to compare with the attributes available even though they are the same product.
Don't offer the same colors which can throw off algorithms taking that into account
Titles are too different to compare based on words used or substrings inside the title.
Very little information provided in the description
Prices listed are not close to each other. Price comparison is often used with comparing the similarity
As we'll see throughout this article, product matching is an extensive topic in retail and ecommerce that covers many different use cases that produce roi.
Machine Learning Approach To Product Matching
Modern product matching uses many different features and machine learning algorithms to compare the similarity of products. The wide range of available similarity algorithms we can use allows us to build our comparison tools based on any level of available product data. These points below are common components of product matching today.
Product Title Similarity
Using deep learning algorithms such as spaCy and GPT-3, Scalr.ai builds a title similarity module that learns to better understand contextually similar titles even when the compared title strings are very different. Here's an example of 4 titles from the same exact product:
Garmin nuvi 2699LMTHD GPS Device
nuvi 2699LMTHD Automobile Portable GPS Navigator
Garmin NUVI 2699LMTHD — GPS navigator — automotive 6.1 in
Garmin (nuvi) 2699LMT HD 6" GPS with Lifetime Maps and HD Traffic (010–01188–00)
The same products can have very different looking product titles which will make matching difficult when using exact string comparisons or even similarity tools like damerau–levenshtein.
Price comparison is one of the features we can use when matching products in a larger algorithm. We mostly use two data analysis algorithms to help us map price similarity in our matching:
Price outlier detection - Outliers can help us figure out similarities when comparing a single price to a larger group of similar priced items that are otherwise matched.
Clustering - Clustering algorithms such as K-means allow us to understand the size of a similar products market based on pricing and can use information learned as a feature in our overarching product matching system.
Image similarity is one of the most powerful and important deep learning algorithms we can use to find the similarity between two products. As we'll see in the use cases there are many different tasks image similarity can be used for when matching ecommerce and retail products. Scalr.ai's module for images can learn the similarity between products no matter the angle, image quality, design size, or background.
We've built this high powered solution using the most up to date image recognition architectures and fine tune it for each specific business use case. This fine tuning allows us to tailor the results for the ecom brand and produce model accuracy results that smash prebuilt solutions.
Deep Learning Attribute Extraction
Product attributes like brand, size, condition, model number, colors available, description, and more can still be used as effective data points to match our products. We can split our attributes into two different categories:
Limited Range Values: Product attributes with a fixed range of available values such as colors, clothing sizes (small, medium, large). Using the data science term "one hot encoded vectors" we can transform these data points to a format that works with ai.
Endless Values: An attribute fits into this category if its possible values do not have a fixed range naturally. While these data points can theoretically have unlimited available values, the accuracy of our models takes a hit as the range grows.
We use custom built neural networks to learn the relationship between product similarity and two products attributes.
UGC analysis allows us to use product reviews as a module in our product matching solution. We build a custom gpt-3 based tool that learns key talking points and keywords in the reviews left for a specific product. This information can help us learn more about the similarity in how two products are percieved then how they are presented for sale. This comes in handy when the use case of product matching turns more customer focused instead of strictly presentation.
Use Cases For Product Matching In Ecommerce The Produce Business Value
Product Matching To Automate Copyright Infringement Search
Large retailers and ecom brands spend 1,000s of hours and deploy entire teams to scour the internet looking for brands using their designs, logos, or products as their own and selling them. We know a few brands by name that are doing this search manually every day and have made it a priority to automate this search process. Often these designs will not be visually identical to our stores and can have completely different titles and product information making it difficult to use old school google search methods.
What components do we use?
The focus point for our product matching system for identifying copyright stikes starts with an image similarity model and uses title similarity as a reinforcement of the results. This multiple heads approach allows us to rely mostly on the image we find, but use titles as an input to adjust our output similarity result. The main benefit to focusing on images when matching for copyright strikes is it allows us to look for designs that are similar to ours but the competitor has changed everything else about the product. Often stolen designs will be changed slightly in appearance but greatly changed in the information presented, to hide the action. Here's a breakdown of the two components:
Scalr.ai Design Similarity Model
Scalr.ai has built a custom image similarity model for copyright strikes that focuses on the model's understanding of product design and graphics. Our model can be pulled apart and customized to any specific use case or industry. The best part about the ability of our model is you are not required to include your product images in the training data each time, allowing you to quickly run new images through without retraining the entire model.
Our model has learned how to identify what matters in a product image, not what the product actually is. This improves generalization and greatly outperforms out of the box options.
GPT-3 or SpaCy based similarity models are our go to models to learn the relationship between sentence structure and word placement. The key knowledge we're trying to gain using this component here is a standardized way to decide strike or not when the image is close. Adding this component boosts our overall accuracy by quite a bit and removes edge cases causing false positives.
Price Intelligence For Automated Competitive Pricing
Price intelligence allows us to understand how competitors are pricing similar or competing products to ours and track how they adjust the prices over time. These price insights are incredibly valuable as most customers compare prices across multiple competitors before making a decision. This tool allows you to automatically stay price competitive and easily boost revenue by 9.3%.
The Challenge Faced Today
Too often today this task of price intelligence is done poorly and doesn't allow for an efficient and effective process.
Manual solutions that require humans to research and track competitor products, as well as manually extract and enter data. You can imagine how slow and inaccurate this can be especially as you scale up your own products
Poorly constructed automation such as website scraping technology that lacks the ability to be standardized across multiple competitor websites which leads to the inability to scale to large volumes
Standardize The Process & Gather Real Pricing Insights
Our system here produces product matches based on all the components laid out in our initial section and focuses on understanding the relationship between a retailers item attributes to form a group of matches. With this data aggregated together, we provide powerful insights that allow you to optimize your own products pricing in real time. Over time this price optimization raises revenue by increasing customers as they go and compare your price to others.
Text models such as GPT-3, BERT, SpaCy, etc let us analyze titles, descriptions, product categories, and much more as the matching works to understand how the competitor product is being positioned.
Our custom image model uses popular architectures such as ResNet, Siamese Networks, and Keras to learn what our regions of interest are in a product image. These include designs, colors, product type, logos etc and are the backbone of how we scale our search for competitors to millions.
Optimize Your Product Listing Based On Automated Competitor Research
Use deep learning based match algorithms to discover information gaps in your listings causing you to lose potential customers to competing companies. This software system analyzes competing listings from matched stores and through training data learns information in descriptions, titles, upc codes, google analytics, and other identifiers that will lead to higher user conversion rates for you.
Analyze your listing description vs other brands
Once we've gathered our competing products we use multiple ai tools to analyze the different sections of a listing. We start with our GPT-3 based model to digest and make sense of the product listing information. The system not only extracts key talking points and keywords used across successful listings but understands language norms and sentence structure to compare our description to ours. Our GPT-3 solution is tailored just to your businesses use case which will always produce better accuracy and satisfaction than out of the box options.
Product Title Intelligence
Titles can be used to extract important keywords and copywriting knowledge in the same way we do using GPT-3. Identifying gaps in your titles where other websites have figured out what information to include to see more "product xyz sold" emails come through is one of the most important conversion based optimizations you can make. The best part being we can eliminate the manual guessing game and use raw market data understanding to do that.
Listing Attributes Taxonomy
A challenge faced when filling listing gaps is understanding how many colors, SKUs, categories, etc we need to increase customer conversions. Neural network driven learning allows us to track and identify what attributes are must-haves for a product market we are trying to dominate. Once our model learns the relationship between important attribute identifiers and market leaders in the exact product space we can optimize our own listing based on what gaps to fill.
Market Data Collection For Recommendation Systems
We build a ton of recommendation system solutions for ecommerce and retail, and collecting valuable data to use for training is always a hurdle we must account for. When we want to build recommendation system solutions around on site recommendations we can use competitor products and their recommended products as training data for our own website.
Automated Data Quality Processes
Not only are manual data extractions slow and a waste of human resources, they also lead to more data quality issues and mistakes when following a standard data format. Product data requires way more accuracy & standardization than general market or customer data given the wide variety of sources and features. No matter the use case, this cannot be completed at scale without automated data quality processes. Anytime we want to match products from various stores to ours we grab data and there must be a process to extract, clean, parse, and format the data to pass into our match system.
The data quality module allows the retail store or online brand to easily plug the custom piece into any product match use case and begin cleaning and standardizing the powerful data being used with NLP patterns, attribute standardization, feature parsing, and many more data science pipelines that automate your data collection methods.
Still Interested In Using Ai To Enhance Your Brand?
Scalr.ai builds custom ai software solutions that deliver clear cut roi to your business and give you a new competitive edge in your market. Ecom is slowly moving towards using ai to gain an edge and we've built and used all the models needed to put huge increases in AOV, LTV, and revenue right in your lap. Let's talk today -> www.scalr.ai/contact
Oops! Something went wrong while submitting the form.
Ai & machine learning consulting company focused on increasing revenue for clients. We specialize in data science and deep learning development that give businesses a better understanding of their revenue streams and building tools to make them more profitable.