Cyberinfrastructure

Storage, retrieval, analysis and visualization of Network Science data pose formidable research and development challenges. In particular, the underlying data model for Network Science applications is based on graph-structured data. Queries on such data sets are based on the structural properties of graphs, in addition to the values of attributes. These demand new solutions. And these data sets are large. For example, an available Wikipedia set is approximately 180GB gzipped, a Twitter dataset contains 467 million posts, and the California Highway Sensor Network dataset covers data from 14 thousand sensors in 12 Districts for 6 years resulting in 600GB gzipped. Some of the datasets are also dynamic in nature, for example, the Wikipedia dataset has over 4 million articles, which are edited by about 40 thousand editors performing about 4 million edits per month. The Wikipedia data set has 0.5 billion unique visitors and 9 billion views per day. 

Managing, programming and visualizing of complex networked datasets is another significant research problem. Graph-based applications from areas such as social sciences and biology pose their own unique set of challenges, making the task of designing a generic graph-oriented data management solution a critical research challenge. Scalable and elastic management of large network science applications requires large data centers for both the processing as well as the management of the data. Datacenter management itself is a complex problem since manual approaches are no longer tenable due to the sheer scale and number of components in such infrastructures. The general consensus is that these systems should be self-monitored through continuous observations and machine learning and data mining technologies can be potentially deployed for autonomic control of cloud computing infrastructures. Furthermore, it is critical to provide the programming tools and techniques that will enable scientists to write their own software applications for analyzing large networks. Finally, high-level analysis of networks is often facilitated by visual presentations, most commonly as node-link diagrams, but also via matrix representations or other space-filling and hybrid approaches. Visualization is one of the main means of exploratory graph analysis. Considerable research progress has been made on the interactive visualization of small-to-medium-scale graph structures. The effective presentation and interactive analysis of very large-scale graph structures, on the other hand, is still largely an uncharted territory.

Related Training Modules

Affiliated Faculty

Databases, Distributed Systems, Cloud Computing, Social Networks

Control theory, Multi-agent networks, Robotic coordination, Power systems

Software engineering, Web software and services, Automated verification

Network visualization

Semantic Networks, Cultural Sociology, Diversity in Organizations

Electrical and Computer Engineering

Photo

Evolutionary bioinformatics, Phylogenetics

Data bases, Data mining