The new artificial intelligence systems that can chat with us — "large language models" — devour data.
LexisNexis Risk Solutions runs one of the AIs' favorite cafeterias.
It helps life insurance and annuity issuers, and many other clients, use tens of billions of data records to verify people's identities, underwrite applicants, screen for fraud, and detect and manage other types of risk.
The company's corporate parent, RELX, estimated two years ago that it stores 12 petabytes of data, or enough data to fill 50,000 laptop computers.
Patrick Sugent, a vice president of insurance data science at LexisNexis Solutions, has been a data science executive there since 2005. He has a bachelor's degree in economics from the University of Chicago and a master's degree in predictive analytics from DePaul University.
He recently answered questions, via email, about the challenges of working with "big data." The interview has been edited.
THINKADVISOR: How has insurers' new focus on AI, machine learning and big data affected the amount of data being collected and used?
PATRICK SUGENT: We're finding that data continues to grow rapidly, in multiple ways.
Over the past few years, clients have invested significantly in data science and compute capabilities.
Many are now seeing speed to market through advanced analytics as a true competitive advantage for new product launches and internal learnings.
We're also seeing clients invest in a wider variety of third-party data sources, to provide further segmentation, increased prediction accuracy, and new risk indicators as the amount of data types that are collected on entities (people, cars, property, etc.) continues to grow.
The completeness of that data continues to grow, and, perhaps most significantly, the types of data that are becoming available are increasing and are more accessible through automated solutions such as AI and machine learning, or AI/ML.
As just one example, the dramatic improvements in the accessibility of electronic health records are new to the industry, contain incredibly complex and detailed data, and are much more accessible (and increasingly so) in recent years.
At LexisNexis Risk Solutions, we have always worked with large data sets, but the amount and types of data we're working on is growing.
As we work with carriers on data appends and tests, we're seeing an increase in the size of the data sets they are sending to us and want to work with. Files may have been thousands of records in the past, but now are exponentially larger as carriers look to better understand their customers and risk in general..
When you're working with data sets in the life and annuity sector, how big is big?
The biggest AI/ML project we work with in the life and annuity sector is a core research and benchmarking database we utilize to, among other things, do most of our mortality research for the life insurance industry.
This data set contains data on over 400 million individuals in the United States, both living and deceased. It aggregates a wide variety of diverse data sources including a death master file that very closely matches U.S. Centers for Disease Control and Prevention data; Fair Credit Reporting Act-governed behavior data, including driving behavior, public records attributes and credit-based insurance attributes; and medical data, including electronic health records, payer claims data, prescription history data and clinical lab data.
We also work with transactional data sets where the data comes from operational decisions clients make across different decision points.
This data must be collected, cleaned and summarized into attributes that can drive the next generation of predictive solutions.
How has the nature of the data in the life and annuity sector data sets changed?
There has been rapid adoption of new types of data over the last several years, including new types of medical and non-medical data that are FCRA-governed and predictive of mortality. Existing sources of data are expanding in use and applicability as well.
Often, these data sources are entirely new to the life underwriting environment, but, even when the data source itself isn't new, the depth of the fields (attributes) contained in the data is often significantly greater than has been used in the past.
We also see clients ask for multiple models and large sets of attributes transactionally and retrospectively.
Retrospective data is used to build new solutions, and often hundreds or thousands of attributes will be analyzed, while the additional models provide benchmarking performance against new solutions.
Transactional provides similar benchmarking capabilities against previous decision points, while attributes allow clients to support multiple decisions.
The types and sources of data we're working with are also changing and growing.
We find ourselves working with more text-based data, which requires new capabilities around natural language processing. This will continue to grow as we use text-based data, including connecting to social media sites to understand more about risk and prevent fraud.
Where do life and annuity companies with AI/ML projects put the data?