Better decision making with alternative data

Published on: 2024-08-10 18:48:28

“Garbage in, garbage out. Or rather more felicitously: the tree of nonsense is watered with error, and from its branches swing the pumpkins of disaster.”

This line from The Gone-Away World fits decision making too.

Decisimo decision engine

Try our decision engine.

Enhanced Decision Making

The authenticity and integrity of the data used in the decision making process determine decision quality. They also affect the outcome. Data sources fall into 2 categories: primary and secondary.

The primary data source comes from the customer or user entering data and answering questions in the dataflow. Secondary sources include alternative data providers and data brokers. If you collect less data from the customer, you need to collect or augment more from secondary sources or brokers.

Data used for enrichment can include traditional personal identifiers such as phone numbers, email addresses, and physical addresses. It can also include technical data such as device information, browser data, and IP addresses.

 

Assisted login data collection

Data brokers use sources that include API requests and “assisted login data collection”. API requests are straightforward. Send a request and receive data in the response. Assisted login data collection creates more challenges for data quality and integrity.

Assisted login data collection can range from Open Banking API standards to account scraping. Account scraping often happens when users provide login credentials on a site. In the background, data providers log in and scrape the data. They clean it and send it to the business where the user provided the credentials. Some websites try to prevent scraping, so no data is collected in those cases.

 

Challenges to alternative data sourcing

Because of how alternative data is collected and processed, it can be incomplete. Data quality can be questionable and may lack the integrity needed to be useful. Sometimes the whole service is unavailable. That happens when the scraped site blocks data collection.

Traditionally, alternative data is used for better segmentation and customer profiling in marketing. It is used to train predictive credit scoring models, enrich user profiles, and verify user identity. It can also reduce fraud and data misuse.

Alternative data sourcing has 2 main challenges: legality and outages. Although the customer owns the information, platforms being scraped often oppose it. They actively try to stop it and may describe it as data “theft”.

Another challenge is service outages that lead to technical errors such as “service unavailable”. Data responses may be incomplete even when they appear successful. Some values are missing, and the data is incomplete. Response times are sometimes longer than business processes allow. That makes it impossible to collect the required data successfully.

With this in mind, a few good practices and controls can help when working with alternative data:

  • Expect data problems.
  • Build decision flows that still execute when data is missing.
  • When using multiple data sources, call them in parallel to reduce execution time.
  • Cache data for the period in which you are sure it could not have changed from a business point of view. Sometimes it helps to ask the source how often the data is refreshed.
  • When building predictive models and categorizing variables, separate “no response”, “no data”, and “null value”. Each has a different business meaning and can have different predictive power. If you combine them into one “null value”, you can lose useful predictive signal.

Using alternative data sources does not have to harm the quality of the decision making process or its outcome. You can put safeguards in place to avoid the “garbage in, garbage out” scenario.

Decisimo decision engine

Try our decision engine.