Loading
Malick A. Sarr

Data Scientist

Data Analyst

Malick A. Sarr

Data Scientist

Data Analyst

Blog Post

How to Unzip Files with Python?

How to Unzip Files with Python?
 

Big data can sometimes be a real space hog on servers. To save room and ensure folks can grab stuff briskly, these smart folks compress those data piles. Now, guess what? Python’s got your back when it comes to unzipping those files and diving into the goodies inside. Ready to see the magic? Let’s roll.


Unzip files with Python using zipfile module.

Unzipping files using Python is a common task, and you can achieve it using the built-in zipfile module. This module allows you to manipulate ZIP archives.

Here’s a step-by-step guide on how to unzip files using Python:

 

  1. Import the zipfile module: First, you must import the zipfile module to work with zip files.
  2. Specify the zip file path: Provide the path to the zip file you want to unzip.
  3. Specify the extraction directory: Choose the directory where you want to extract the zip file’s contents. You can use the OS module to handle file operations.
  4. Unzip the file: Use the zipfile.ZipFile class to open and extract the contents of the zip file. Then, you can use the extractall() method to unzip all files in the archive to the specified extraction directory.
import zipfile
import os

zip_file_path = 'path/to/your/file.zip'
extraction_path = 'path/to/extraction/directory'

with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    zip_ref.extractall(extraction_path)

The ‘r’ mode indicates that you’re opening the zip file for reading.

And that’s it! With these steps, you can unzip the specified file using Python.

 

Unzip files with Python using the zipfile module with glob 

 

Time to dive in and get your hands dirty – let’s unzip some files! First things first, crack open a fresh script or Jupyter notebook. You’ll want to call in the big guns, so make sure you import the glob and zipfile modules. Think of glob like a detective searching for files, and zipfile as your trusty unzipping sidekick.

import glob
import zipfile

Now, let’s talk about finding those zip file paths. Picture glob() as a magic spell that lets you peek into your chosen directory, which holds all those downloaded zip files. Sneaky trick: add *.zip to the formula, and watch as glob rounds up all files with that snazzy .zip ending. The result? A neat list of paths to all the zipped-up files.

files = glob.glob('data/*.zip')
files

 

Your console might light up with something like this:

['data/part1.zip',
 'data/part2.zip',
 'data/part3.zip',
 'data/part4.zip',
 'data/part5.zip',
 'data/part6.zip']

 

Now, the grand finale – time to unleash the unzipping magic just like in the previous section. This calls for a classic loop. We’re talking about looping through each of those files that glob kindly gathered for you. Each turn, the ZipFile() function swoops in to read the file, while the extractall() function does the heavy lifting of unzipping everything into a special folder called data/raw.

for file in files:
    print('Unzipping:', file)

    with zipfile.ZipFile(file, 'r') as zip_ref:
        zip_ref.extractall('data/output')

 

Unzip files with Python without using glob

Suppose you do not want to use glob to filter files in a directory based on a pattern. Then ensure that all the file names that need to be unzipped are within one directory. Then you can run the code below to save all file names in a directory onto a list and run the zipfile module that way.

import os

directory_path = 'path/to/your/directory'

# List all files in the directory
file_list = os.listdir(directory_path)

# Filter out directories and save file names to a list
file_names = [filename for filename in file_list if os.path.isfile(os.path.join(directory_path, filename))]

print(file_names)

 

Let’s rock those files open! 🚀📂

If you made this far in the article, thank you very much.

 

I hope this information was of use to you. 

 

Feel free to use any information from this page. I’d appreciate it if you can simply link to this article as the source. If you have any additional questions, you can reach out to malick@malicksarr.com  or message me on Twitter. If you want more content like this, join my email list to receive the latest articles. I promise I do not spam. 

 

[boldgrid_component type=”wp_mc4wp_form_widget”]

 

 

 

 

Tags:
Write a comment