A two-week boot camp for new trainees will be organized right before the beginning of the academic year to introduce (and refresh) programming, software, and data. A senior IGERT trainee or a graduate student will lead the boot camps under the supervision of the Operations Committee. A preliminary schedule for the boot camp is as follows.
Week 1: Introduction to large text file processing. This module will introduce trainees to program and software support for collecting, cleaning an initial processing of large repositories of data. For data collection, initial examples of online data crawlers and using public APIs (Application Programming Interfaces) will be discussed. Tools and software covered include Linux tools such as Bash, Awk; Scripting using Python and Perl; Analysis in Matlab and R.
Week 2: Targeted hands-on-mini-labs.
- Network algorithms lab: distance/proximity, traversal, connected components, partitioning, betweenness-centrality; tools: Jaca/C++ using graph libraries and packages such as SNAP (Stanford) and METIS.
- Cluster/distributed network processing lab: Hadoop, Map Reduce, Specialized graph-oriented libraries for distributed processing; tools: Hadoop, Pregel, MPI.
- Visualization and exploration lab: software for visualization of both networks and rich attributes such as text content; tools: NodeXL, Cytoscape, GraphViz, R (topic modeling and other packages).