Better decision making with alternative data

Published on: 2024-08-10 18:48:28

“Garbage in, garbage out. Or rather more felicitously: the tree of nonsense is watered with error, and from its branches swing the pumpkins of disaster.”

This line from The Gone-Away World fits decision making too.

The authenticity and integrity of the data used in the decision making process determine decision quality. They also affect the outcome. Data sources fall into 2 categories: primary and secondary.

The primary data source comes from the customer or user entering data and answering questions in the dataflow. Secondary sources include alternative data providers and data brokers. If you collect less data from the customer, you need to collect or augment more from secondary sources or brokers.

Data used for enrichment can include traditional personal identifiers such as phone numbers, email addresses, and physical addresses. It can also include technical data such as device information, browser data, and IP addresses.

Assisted login data collection

Data brokers use sources that include API requests and “assisted login data collection”. API requests are straightforward. Send a request and receive data in the response. Assisted login data collection creates more challenges for data quality and integrity.

Assisted login data collection can range from Open Banking API standards to account scraping. Account scraping often happens when users provide login credentials on a site. In the background, data providers log in and scrape the data. They clean it and send it to the business where the user provided the credentials. Some websites try to prevent scraping, so no data is collected in those cases.

Challenges to alternative data sourcing

Because of how alternative data is collected and processed, it can be incomplete. Data quality can be questionable and may lack the integrity needed to be useful. Sometimes the whole service is unavailable. That happens when the scraped site blocks data collection.

Traditionally, alternative data is used for better segmentation and customer profiling in marketing. It is used to train predictive credit scoring models, enrich user profiles, and verify user identity. It can also reduce fraud and data misuse.

Alternative data sourcing has 2 main challenges: legality and outages. Although the customer owns the information, platforms being scraped often oppose it. They actively try to stop it and may describe it as data “theft”.

Another challenge is service outages that lead to technical errors such as “service unavailable”. Data responses may be incomplete even when they appear successful. Some values are missing, and the data is incomplete. Response times are sometimes longer than business processes allow. That makes it impossible to collect the required data successfully.

With this in mind, a few good practices and controls can help when working with alternative data:

Expect data problems.
Build decision flows that still execute when data is missing.
When using multiple data sources, call them in parallel to reduce execution time.
Cache data for the period in which you are sure it could not have changed from a business point of view. Sometimes it helps to ask the source how often the data is refreshed.
When building predictive models and categorizing variables, separate “no response”, “no data”, and “null value”. Each has a different business meaning and can have different predictive power. If you combine them into one “null value”, you can lose useful predictive signal.

Using alternative data sources does not have to harm the quality of the decision making process or its outcome. You can put safeguards in place to avoid the “garbage in, garbage out” scenario.

How to integrate Google Places API Find Place into a decision engine
This article shows how to connect Google Places API Find Place to a decision engine and use it for address validation and profiling. It walks through getting an API key, creating the data source in Decisimo, and adding it to a decision flow so address checks run in parallel with the rest of your logic.
Why Decision Lineage Matters in Chained Decision Flows
Decision flows are no longer single tables or isolated rulesets. They chain rules, third-party calls, segment splits, and model steps, which makes the final outcome harder to explain unless you track decision lineage at each step.
Prevent identity and synthetic fraud in consumer lending
Identity fraud in consumer lending puts pressure on risk management from several angles. It increases PR exposure, fraud losses, and credit risk. It also damages customer trust and adds operational workload.
How to detect Google Suite hosted email address
This article explains how to detect Google Suite-hosted email addresses by checking DNS MX records, and why public data cannot reliably show whether an account is a paid G Suite instance or a legacy free Google Apps account. It then shows how to query MX records through a DNS API and implement the check in Decisimo with a sample data object, data source, and decision flow.
Data sources you can use
This article explains how to integrate external RESTful APIs into Decisimo decision flows. It covers the minimum API requirements and supported response formats, JSON or XML. It also lists common third-party data types, such as anti-fraud, address validation, and email profiling, as well as the option to call internal data services. Finally, it explains the available integration templates and how each API response is stored as JSON in the flow’s output attributes for downstream decision logic.

Better decision making with alternative data

Try our decision engine.

Assisted login data collection

Challenges to alternative data sourcing

Try our decision engine.

Better decision making with alternative data

Try our decision engine.

Assisted login data collection

Challenges to alternative data sourcing

Try our decision engine.

Related Articles