UA Researcher Wins NSF Career Award to Improve Big Data Management
Ravi Tandon is on a mission to help industries make the most of their big data.
The UA electrical and computer engineering assistant professor has received a $500,000 Faculty Early Career Development Award, the National Science Foundation’s highest honor for junior faculty members, to advance his research on information theory and coding. His latest quest is focused on making data exchange in distributed cloud computing more efficient.
In the NSF study, Tandon -- who was a postdoctoral fellow at Princeton University and faculty researcher at Virginia Tech before joining the UA in 2015 -- is developing adaptable code and algorithms to minimize communications among machines and help them work together more effectively. His work potentially has implications for financial solvency, weather prediction, international trade, public health and national security.
Clusters in the Cloud
Big data is increasingly used to improve business processes, health care, public safety, cybersecurity, and scientific research. Online retailers like Amazon are mining data from users’ shopping cart activity and wish lists to predict future purchasing choices. Law enforcement agencies use big data to detect patterns of criminal behavior and solve crimes. The National Weather Service uses it for climate forecasting. Banks use it to identify customers’ preferences for banking on mobile devices, at ATMs or with bank tellers.
Traditionally, when companies and agencies needed more space for collecting and storing data, they put more servers on the ground. Now, as their daily tallies for data collection regularly reach into terabytes, they are looking for faster, less-costly ways to handle the deluge.
An increasingly popular solution, especially among giants like Google, Amazon, IBM and Microsoft, is distributed cloud systems, in which clusters of wirelessly connected machines share resources and workload. Data is stored in a master node, or server, then distributed in smaller chunks to “worker” machines for processing and analysis.
While distributed cloud systems typically are faster and cost significantly less than centralized networks, they are not without their challenges. Many machines performing at varying levels are working simultaneously on different parts of the same big data task, and they must communicate with one another at every stage of data reorganization and exchange.
“For example, machines in a cluster are not all the same; one might be slower than another,” Tandon said. “This leads to bottlenecks of data that slow down the computing. In addition, all of the data reshuffling results in excess communications, which have no value but increase cost.”
Engaging Future Engineers
“This project provides exciting opportunities for students to work on urgent problems in large-scale data processing and analysis that will hopefully inspire them to pursue educations and jobs in this area,” he said.
“Career: Communication-Efficient Distributed Computation: Information-Theoretic Foundations and Algorithms” is supported by the National Science Foundation under grant No. 1651492.