A data lake is a collection of data or files generally stored in a cloud file storage or blob infrastructure. This data may or may not be unprocessed or clean. It generally is the landing zone for raw data coming out of other systems before it makes it way into the data warehouse.