“Modeling” Shouldn’t be a Bad Word in Retail Data

Why modeled data performs better than deterministic data in many marketing use cases

We often encounter customer data teams desiring to purchase deterministic data. The data science profession has a bias towards deterministic data, although in many retail use-cases the brand would be better off with a modeled data set. In this blog post we’ll explain the differences between deterministic and modeled data, and why modeled data often performs better.

A simple explanation of the two types of data

Deterministic data represents an absolute truth, normally based on an irrefutable source. For example, if you want deterministic data about a person’s salary, a tax return or credit report could provide you with deterministic data. Alternatively, modeled data is the result of a computer making an educated guess by looking at one or more inputs. The Zillow real estate search site has a now-famous model that predicts the price of your house if you’re considering selling.  Its algorithms look at many inputs, including the recent sales in your neighborhood, the square footage of your house and the recently sold homes, your county’s tax records, and many other inputs. How good is their model? Many real estate appraisers say the Zestimate performs better than the typical real estate appraiser.  

How can deterministic data hurt a retailer?

One popular genre of retail targeting campaigns is repurchase-campaigns. This is a campaign to target someone with an offer for a brand the person has previously purchased. This approach typically only works in a few categories. Consumers are very loyal and likely to repurchase cosmetics, fragrances, and personal care items. Conversely, our analysis of over 5 billion apparel and footwear transactions indicate that consumers have weak repurchase-rates for most brands. This is particularly true for young consumers in the Millennial and Gen Z cohorts. This growing population of shoppers, now larger than the combination of Baby Boomers and Gen X, typically has negative repurchase rates for brands. This means if they buy Nike shoes this year they become less likely to choose Nike next year. It’s important to note this isn’t due to dissatisfaction with Nike. Research indicates the younger generation of consumers are novelty seekers and enjoy trying new products and brands.

A second problem with using deterministic data for retail targeting relates to its inadequacy for traditional personalization algorithms. The most popular personalization algorithm is collaborative filtering, which discovers co-purchase relationships. You see it all the time on Web sites, where a site will indicate that people who clicked on Product A also clicked on Product C. This approach is used by Netflix, Amazon, and most retail sites. When won’t it work if you’re using deterministic data? Let’s say you have a brand-new product that has recorded almost no sales (yet). Your deterministic data about old products, many that are probably no longer available, won’t have an intersection with a current product. In other words, the people who clicked on Product A, which is now obsolete, have never clicked on Product New. We call this the cold start problem, which is particularly harmful to the retailer who wants to sell the newest full-price merchandise.

How can modeled data solve this problem?

We’ve illustrated two problems: 1) Younger consumers are brand changers, and 2) High velocity assortment changes, now in vogue with fast-fashion, don’t lend themselves to traditional personalization algorithms.  It’s important to note that our analysis indicates consumers are remarkably consistent in buying products with similar attributes. They might change brands, but they nearly always stay within the white lines of their personal tastes. I speculate the jeans in your closet are different brands, but I’m fairly certain they share commonalities like price range, style, size, and color. You have a personal taste in jeans, although it spans an array of brands with similar attributes.

When our analysis revealed that consumers care much more about taste than brands, we set about building a different type of model. Our models are designed to learn a consumer’s taste in a category by analyzing the features of the purchased product. Using machine learning techniques, we analyzed the 5 billion purchases to learn consumer tastes. The inputs typically include the size, style, price, color, fabric, and other features of the product. To make it relevant to today, we designed the model to analyze the current product universe each day, and to “fit” the new products to each individual taste. This solved the cold-start problem, and delivered much more relevant products to the consumer.

Does it work?  We’ve learned that consumers are 25% likely to respond to an offer that matches their taste. In a head-to-head testing, our taste-based model has consistently outperformed repurchase campaign models that are based on deterministic data.