What Is the SMART FIRES Data Lake?

Instead of rainbow trout, this lake is stocked with massive amounts of research data in all shapes, sizes, and formats. If you’re new to the data lake concept, or if you’ve heard the term tossed around in SMART FIRES circles and wondered what it was, this article is for you.

So… What Is a Data Lake?

A data lake is a scalable digital storage area that can hold any kind of research data—raw or cleaned, tiny or huge, structured or disordered. Unlike a database, which needs tidy, well-organized information, a data lake happily stores research files exactly as they are. The SMART FIRES system is designed to support a large, multi-institutional research community working across disciplines, file types, and computational needs.

Where This Lake Lives

Our data lake lives on Blackmore, Montana State University’s high-performance research storage infrastructure. It’s designed to handle fast, heavy, and growing research workloads:

80 gigabit per second throughput for high-speed data access
Nightly full backups to protect project data
An offsite backup for disaster resilience

Think of Blackmore as the secure vault that holds the lake—one built for scale, speed, and ease of access.

Why SMART FIRES Needs a Data Lake

SMART FIRES involves more than 50 researchers across six Montana universities, working in four major interdisciplinary thrust areas. That means lots of:

Data (imagery, models, field measurements, social science data, code)
Formats (CSV, NetCDF, TIFF, Python scripts, logs, and more)
Collaborators across institutions
Back-and-forth between raw data, high-performance computing (HPC) processing, and analysis

This diversity creates some challenges—coordination, consistency, access, sharing, protection, and long-term storage—all of which the data lake is designed to solve.

Here’s what it enables:

Cross Institution Collaboration - The lake provides a secure, shared space accessible to project participants across different institutions, eliminating access barriers caused by local storage, institutional silos, differential authentication and permissions, or email-based file transfers.
Scalable Storage for Large and Growing Datasets - The project will generate huge datasets over its five-year span. The data lake can expand to meet those needs.
Smooth Integration With High Performance Computing - The data lake integrates with the Tempest, Hellgate, and other HPC systems through Globus, allowing researchers to automatically pull data into a compute job and push results back when processing finishes. This keeps workflows manageable and saves time (and headaches).
Long-term Preservation and Public Access - SMART FIRES has a data management plan that includes archiving and sharing results in public data repositories. The data lake is the backbone that will make that possible.

So How Do We Access This Lake?

You don’t need a kayak—just an account with Globus, the secure research data transfer platform.

Most Montana University System researchers can log in with institutional credentials. Those not under institutional subscriptions can create a free Globus account and then be granted access.

You can access and manage data using:

The Globus Web App (point-and-click uploads and downloads)
The Globus CLI (necessary for HPC workflows)
Python or JavaScript SDKs (for programmatic workflows, automation, or scripts)

Researchers can even embed data transfers directly into compute jobs—automatically pulling raw data to Tempest, running a batch job, and sending processed results back to the lake.

Behind the Scenes

Managing the data lake is a joint effort.

Montana State University IT handles:

Identity and access management
Integration and technical infrastructure
HPC support and training
Security and backup systems

The Montana State University Library handles:

Data curation
Metadata standards
Preservation planning
Training and quality assurance

It’s a partnership designed to support both technical resilience and good data stewardship as foundations to promote research integrity and public trust in research.

How It Fits Into the Data Lifecycle

Research data management isn’t linear—researchers frequently revisit earlier stages of the research process as methods evolve or new insights emerge. The data lake supports this iterative cycle by providing a stable, centralized environment for:

Raw data storage
Iterative cleaning and processing
HPC analysis
Collaborative data stewardship
Active data sharing and future reuse

This structure supports both day-to-day science and long-term project goals.

In Short: Why the Data Lake Matters

The SMART FIRES data lake helps us:

Work together across institutions
Store and access massive datasets
Process data efficiently on HPC
Protect and document research
Prepare for future sharing, publication, and preservation

No fishing poles required—just better science, smoother collaboration, and a system designed to grow with us.

About Us

SMART FIRES

Projects

Broader Engagement

Top searched block

Helpful links

What Is the SMART FIRES Data Lake?

Related News