Students apply their skills to develop innovative solutions to help address national cyber threats. Students work in small teams on a cyber research project, guided by mentors with scientific and computing expertise in the cyber domain.
Projects may include:
- cyber analytics (scalable “big data” processing, statistical inference, anomaly detection, deep learning)
- data integrity (steganography, encryption, adversarial machine learning)
- intrusion detection and analysis (malware reverse engineering, network/protocol analysis)
- cyber-physical system security (complex systems, physical attacks against cyber systems)
For more information on research areas see csr.lanl.gov. Students gain experience in communicating their work through posters and oral presentations. In addition, students attend seminars by LANL researchers and external visitors and given the opportunity to take short courses in core cyber subjects outlined in the Incident Response Track.
Applications will be reviewed on a rolling basis and closed on January 14, 2020 at 11:00 PM.
LANL Cyber Toaster 2020 Research Projects
All projects require the creation of a final report on your research findings, and project mentors will encourage publication of results.
1. Computer Operating System Data Research
In recent years, the importance of capturing data on the endpoints in computer networks has drastically increased, providing rich data sources. With this increase, new research opportunities are emerging. Data sources contain information including authentication, process hierarchies and system level changes. The aim of this project is to develop unsupervised modelling capabilities on a specific aspect or feature of the data with a goal of characterizing the way users and computers behave. At the end of this project, your tool will increase network awareness and ultimately lead to new anomaly detection capabilities.
A project can range from statistical and machine learning research to tool development, depending on the skill sets and interests of the candidate.
- Statistics and/or Machine Learning with experience in python or R
- Strong programming skills
- Some knowledge of cyber security concepts
Your Role in the Project
- Parsing audit logs from computers (Windows, OSX, potentially Linux)
- Standardizing logs across different operating systems
- Implementing statistical models for the data and testing predictive performance
- Implementing tools to analyze and/or visualize anomaly detection results
2. High-Fidelity Simulated Cyber Arms Races for Automated Red & Blue Teaming of Computer Networks by Artificially Intelligent Agents
The Nation is under continuous cyber attack by adversarial nation states and the wars of the future are expected to have increasingly significant cyber components, if not be waged completely in cyberspace. At the same time, there is a rising threat of adversarial cyber AI operating at speeds much faster than can be countered by human defenders. The CoEvolving Attacker and Defender Strategies for Large Infrastructure Networks (CEADS-LIN) project aims to address these critical National Security threats by simulating a cyber armsrace in a high-fidelity computer network emulation. Each skirmish between a defender AI and one or more attacker AIs effectively automates a red team / blue team exercise. Within this framework, computational intelligence techniques automate the development of increasingly sophisticated AI agents and novel network security capabilities. This approximates the game theoretic Nash equilibria representing the highest threat attacks and the corresponding defenses.
For all projects: strong programming skills and understanding of data structures and algorithms. For each subproject (indicated below), specific skill sets are required:
- For projects 1-3: implementation level knowledge of computational intelligence techniques
- For project 4: understanding of network security
- For projects 4-5: understanding of computer network design
- For project 6: understanding of parallel computation
Your Role in the Project
- Automating the Design of Novel/Improved Graph-based Network Security Capabilities (e.g., include network intrusion detection, access control policy optimization, graph-based network attack modeling)
- Increasing the Sophistication of the Hyper-heuristics for the Automated Design of algorithms (e.g., primitive granularity control)
- Creating Attacker/Defenders Agents that can Learn
- Expanding the Attacker/Defender Capabilities with more Realistic Cyber Ops, including Integration with the Automated Design of Security Capabilities
- Adding/Expanding on the Virtual Network Emulation Environment
- Integrating existing systems for use with High Performance Computing (HPC) Systems
3. Learning Critical Infrastructure Structure in the Presence of Adversaries and Data Corruption
Several critical infrastructure networks (power grid, gas network, building controls) involve optimization over time-scales (minutes to hours) for optimal resource usage. For example, the power grid market is settled every 15 minutes. Building controls include day-ahead and hourly planning. A malicious attack on system variables (voltages, sensor measurements) can impact economic network operation, and worse can make the system unstable leading to breakdown. A linear dynamical system is used to model the ambient operation of infrastructure networks. While network identification under clean data has been well-studied, such clean data is often not available. In particular, it can be seen that learning under noisy, or worse - adversarily corrupted data can lead to multiple erroneous connections, that do not exist in the underlying network. This proposal aims to determine optimal techniques (both active and passive) to detect the source of adversarial data-corruptions and to identify the correct underlying network once corruptions are corrected.
- Programming skills for implementing and testing learning algorithms
- Notions in statistics and experience in working with data are desirable but not absolutely mandatory
Your Role in the Project
- Implementing learning algorithms based on statistical estimators
- Building test cases
- Generating synthetic data and performing tests
- Running applications to real data
4. Supervisory Control and Data Acquisition System Applications, Analytics, and Simulation
The modern power grid is a complex, large scale cyber-physical system composed of generation, transmission, and distribution elements. Coupled with the proliferation of renewable energy sources, the electric power grid is in a transition to a smarter grid, enabling better use of our energy resources, but also introducing new vulnerabilities. LANL has the unique opportunity to turn the local power grid into a working testbed. While there are several well-developed supervisory control and data acquisition (SCADA) testbeds in our nation, to our knowledge, this is the only one in operation with an actual, connected grid. Furthermore, LANL is exploring the use of the Facility Control Shield (FCS) which allows operators to independently verify SCADA system and simultaneously detect cyber events such as intrusion, data injection, hijacking attempts or commencement of sabotage attempts by a rogue insider. FCS can, in real time, capture ground truth and independent data from monitored processes and evaluate operations using data analytic methods. This information is used to identify anomalies that could be associated with possible inside and outside threat, and to evaluate the reliability and legitimacy in the prediction of future process activities. A project can range from statistical analysis of SCADA system data sets, machine learning for cyber security, to simulation-based SCADA testbed development depending on the skill sets and interests of the candidate.
- Programming skills
- Knowledge of principles of internet-of-things
- Communication networks
- (Bonus) Machine learning/statistics
- (Bonus) Power Grid infrastructure and operations
Your Role in the Project
- Analyzing SCADA-based system and data set(s)
- Performing simulation-based studies of SCADA networks
- Potentially implementing statistical and machine learning techniques
5. Multi-Modal Generative Models for Resilient Detection of Fake Data
Today, all modalities of sensor data are easier to spoof than ever before. Realistic images, audio, video, and text can be synthesized efficiently by machine learning algorithms, which learn their realism directly from data. It is only a matter of time until these generative algorithms are deployed by adversaries in a wide range of sensing domains. To prepare for this shifting reality, we need to measure the detectability of fake data in domains of critical importance to the Los Alamos mission.
The detection of synthetic/altered content typically relies on tell-tale digital signatures embedded in the file meta-data, or on anomalous statistical structure such as incompatible lighting in spliced image regions. However, well-financed, technologically sophisticated adversaries will increasingly employ counter-forensic strategies to mitigate or erase such signatures. Thus, there exists a need to develop methods for detecting synthetic/altered content based on a combination of video, audio, time series, and text. New detection methodologies should be resilient to counter-forensic attacks. The goal of this project will be to develop strong multi-modal baselines for real data streams, and to develop models that can effectively detect fake or manipulated data. This project will include specific applications of fake data detection to nuclear nonproliferation efforts.
- Proficiency in programming for data analytics (Python preferred)
- Basic knowledge of machine learning and statistics
- (Bonus) Knowledge of high performance computing
- (Bonus) Experience with signal processing and/or neural computing
- (Bonus) Knowledge of nuclear facility monitoring and/or neutron coincidence/multiplicity counting
Your Role in the Project
Depending on the background of the student, this project can be focused on:
- Resilient machine learning models for detecting fake muti-modal data
- Sparse coding models for multi-modal data
- Applied analytics for detecting AI-generated data in the application area of nuclear nonproliferation
6. Secure & Assured Systems Research using Post-Quantum Cryptography & Zero-Knowledge Proofs
Theoretical cryptography has evolved enormously over the last decade with the realization of fully homomorphic encryption, post-quantum cryptography, and efficient zero-knowledge proofs. The general thrust of these cutting-edge techniques is to enable the ability to blindly compute on encrypted data without knowing the key and proving the correctness of systems without revealing sensitive information. There is enormous untapped potential for the application of these advances to ensure the security and robustness of our nation’s critical infrastructure and capabilities. LANL is interested in applying these techniques to solve difficult challenges in cybersecurity. Students will be exposed to the wide applicability of these techniques, architect a cyber-security solution, and then engineer proof-of-concept software tooling that demonstrably shows security improvements over modern systems. Potential areas of impact include, but are certainly not limited to, assured, distributed machine learning, nuclear treaty verification, supply chain integrity, and remotely verifiable information provenance.
- Strong Programming skills (C++ preferred)
- Strong background in pure mathematics
- Understanding of algorithms, computational complexity, and other theoretical computer science topics
- Willingness and ambition to learn advanced research-level topics at a fast pace
- (Bonus) Understanding of cyber-physical system security
- (Bonus) Experience with machine learning
- (Bonus) Prior exposure to theoretical cryptography
- (Bonus) Interest in provable security (e.g. formal methods)
Your Role in the Project
- Review and learn from recent and relevant literature
- Architect a security solution to a real-world problem using cutting-edge cryptographic approaches
- Develop, test, and benchmark proof of concept code