Storage, retrieval, analysis and visualization of Network Science data pose formidable research and development challenges. In particular, the underlying data model for Network Science applications is based on graph-structured data. Queries on such data sets are based on the structural properties of graphs, in addition to the values of attributes. These demand new solutions. And these data sets are large. For example, an available Wikipedia set is approximately 180GB gzipped, a Twitter dataset contains 467 million posts, and the California Highway Sensor Network dataset covers data from 14 thousand sensors in 12 Districts for 6 years resulting in 600GB gzipped. Some of the datasets are also dynamic in nature, for example, the Wikipedia dataset has over 4 million articles, which are edited by about 40 thousand editors performing about 4 million edits per month. The Wikipedia data set has 0.5 billion unique visitors and 9 billion views per day.
Managing, programming and visualizing of complex networked datasets is another significant research problem. Graph-based applications from areas such as social sciences and biology pose their own unique set of challenges, making the task of designing a generic graph-oriented data management solution a critical research challenge. Scalable and elastic management of large network science applications requires large data centers for both the processing as well as the management of the data. Datacenter management itself is a complex problem since manual approaches are no longer tenable due to the sheer scale and number of components in such infrastructures. The general consensus is that these systems should be self-monitored through continuous observations and machine learning and data mining technologies can be potentially deployed for autonomic control of cloud computing infrastructures. Furthermore, it is critical to provide the programming tools and techniques that will enable scientists to write their own software applications for analyzing large networks. Finally, high-level analysis of networks is often facilitated by visual presentations, most commonly as node-link diagrams, but also via matrix representations or other space-filling and hybrid approaches. Visualization is one of the main means of exploratory graph analysis. Considerable research progress has been made on the interactive visualization of small-to-medium-scale graph structures. The effective presentation and interactive analysis of very large-scale graph structures, on the other hand, is still largely an uncharted territory.
Related Training Modules
- M7: Discovery of the Emerging Dynamical Phenomena in Distributed Systems
- M11: Symbolic Execution with a Graph Database
- M29: FLoRa Framework
- M33: First Order Linear Temporal Logic as a Query Language
- M44: Motif distributions in daily Bitcoin transaction graphs
- M47: Visualizing Unknown Variables at Varying Scales in a GIS
- M48: Mapping slums using machine learning, remote sensing, and volunteered geographic information
- M50: Theoretical Conditions for the Regions of Attraction in Kuramoto Systems