UMD iSchool and Google/Chronicle Utilize Commercial Data Mining Techniques to Develop Cybercrime Predictor

Hayleigh Moore - April 3, 2020

Led by UMD iSchool researcher and professor Dr. Charlie Harry, the team is using machine-learning methods to extract socioeconomic characteristics from the data and utilize that to predict and weigh cyber threats.

User accessing data on cellphone

To many, accessing the internet comes with the expected price of forgoing data privacy. Once you’ve accepted the terms and conditions of a social media app or consented to a website’s use of cookies, you’ve likely been mined of your personal data by companies who use this data to market their products or services to you, influence your political stance, or profit from selling it to other companies. Although many users look darkly upon this extraction and use of their personal data, these profit-driven methods may have led to an important national security solution in the war against cybercrime.

Chronicle, a subsidiary owned by Google and its leading security branch, works with a tremendous amount of data to understand cybercrime that exists worldwide. For the last 4-5 years, Chronicle has been archiving malware occurrences, now at 2.1 billion samples and growing. Chronicle’s data processing tools reveal predominant types of cybercrimes, such as illegally mining cryptocurrency or stealing passwords, but they are limited in extracting socioeconomic insights or identifying predictive factors.

Researchers at the University of Maryland College of Information Studies (UMD iSchool) are partnering with Chronicle to tackle this data challenge using methods not unlike those of commercial companies. Led by UMD iSchool researcher and professor Dr. Charlie Harry, the team is using machine-learning methods to extract socioeconomic characteristics from the data and utilize that to predict and weigh cyber threats. “The eventual goal is, if we have a better understanding of the targeting decisions of criminal syndicates, can we do a better job of preparing countries to combat cyber crime?” said Harry.

The team is approaching this by first clustering countries by the types of crimeware found in their networks. This includes aggregating over 2 billion individual malware samples broken into 5 broad categories. Countries are sorted into clusters based on their crimeware exposure. Next the team is exploring socioeconomic factors that might predict membership into one of the clusters. For example, is the lack of a developed banking sector a predictor that credential stealing crimeware is less likely to be found?

“We’re looking at [economic and social factors] like ‘How many ATMs per one thousand people? How many bank branches per one thousand people?’ as well as primary, secondary, college and tertiary level of education,” said Harry. Understanding these clusters and their correlation to their predictors will help countries to better predict the threat level of specific types of cyber attacks to their national security. For instance, the team’s research is revealing that some countries are more likely to face a ransomware problem whereas other countries are more likely to deal with crypto mining.

To conduct the prediction the team is leveraging both traditional regression methods and is also leveraging a computerized neural network, or neuronet, that works a lot like the human brain. The neuronet processes each piece of data and makes a connection – or prediction. For example, based on a criminal syndicate’s socioeconomic characteristics, was it likely that they had hacked global banking data? Each time the neuronet predicts the correct outcome, it strengthens that predictive process. After analyzing millions of pieces of current and historic data, the neuronet becomes a more accurate predictor of future cybercrime.

“What is needed in the field is a greater degree of nuance and a greater understanding of what types of threats should be of primary concern versus ones that aren’t that big of a deal,” said Harry. “Having a better sense of what threat profiles look like will help leaders and policymakers to better allocate resources and potentially build a better critical defensive infrastructure. I think what it really boils down to is having a better understanding of the problem, better understanding of where criminals are actually targeting broadly speaking, so that we are more efficient in allocating resources for defense.”