Loading ad...

Working with Emails Fetching and Preprocessing

After running first AI script, Zack was excited. His assistant could summarize a test email and even output JSON. But something was missing.

“This is cool,” Zack thought, “but what’s the point of summarizing made-up emails? I want this to actually read my inbox.”

That’s the challenge we’ll tackle in this lesson. By the end, Zack (and you) will have a Python script that:

  • Connects to Gmail securely using OAuth2
  • Fetches the latest 10 emails from the inbox
  • Cleans the text (no long signatures, no “Forwarded message” noise)
  • Prints out the subject and cleaned body

This is the step where Zack’s assistant starts becoming useful in daily life.

Step 1: Gmail needs OAuth2

Zack’s first thought was: “Can’t I just log in with my email and password?”

The answer is no. For security, Google doesn’t allow basic username/password logins for apps anymore. Instead, you use OAuth2, which is a secure system where:

  • You tell Google, “Yes, this app can read my emails.”
  • Google asks you to log in through a browser.
  • You grant permission to the app.
  • The app receives a token that allows it to read emails, but only with the permissions you gave.

Think of OAuth2 as a guest pass: you hand the assistant a badge that says “read emails only,” and it can’t go beyond that.

Step 2: Setting up Gmail API in Google Cloud

Here’s exactly what Zack did. Follow each click carefully:

  • Go to console.cloud.google.com.
  • Sign in with the Google account that has Gmail.
  • At the top, click the project dropdown and select New Project.
  • Name it something like EmailAssistant and click Create.

Now you have a Google Cloud project.

Next, enable Gmail API:

  • In the left menu, go to APIs & Services > Library.
  • Search for “Gmail API.”
  • Click it and press Enable.

Now the Gmail API is active for your project.

Before creating credentials, you need to set up the consent screen:

  • In the sidebar, click APIs & Services > OAuth consent screen.
  • Choose External (since you’re using a personal account).
  • Fill out:
    • App name: Email Assistant Local
    • User support email: your Gmail
    • Developer contact email: your Gmail
  1. Save and continue.
  2. For scopes, skip for now.
  3. For test users, add your Gmail account.
  4. Save.

This tells Google your app is just for your own testing.

Step 4: create OAuth client ID

  • Go to APIs & Services > Credentials.
  • Click Create Credentials > OAuth client ID.
  • For Application type, choose Desktop app.
  • Name it something like Email Assistant Desktop.
  • Click Create.
  • Download the credentials.json file.

Move this file into your project folder (email_assistant/).

Step 5: Install the Gmail libraries

Back in your terminal, install Google’s libraries:

1
pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib

These packages handle login, token refresh, and Gmail requests.

Step 6: Writing the Gmail quickstart script

Zack copied this starter code into gmail_quickstart.py:

python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
import os.path
import base64
from google.auth.transport.requests import Request
from google.oauth2.credentials import Credentials
from google_auth_oauthlib.flow import InstalledAppFlow
from googleapiclient.discovery import build
from bs4 import BeautifulSoup

# Read-only scope (safe)
SCOPES = ['https://www.googleapis.com/auth/gmail.readonly']

def get_service():
    creds = None
    if os.path.exists('token.json'):
        creds = Credentials.from_authorized_user_file('token.json', SCOPES)
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        with open('token.json', 'w') as token:
            token.write(creds.to_json())
    return build('gmail', 'v1', credentials=creds)

def clean_text(body):
    # remove signatures and forwarded stuff
    lines = body.splitlines()
    new_lines = []
    for line in lines:
        if line.strip().startswith('--') or line.strip().startswith('>'):
            continue
        if line.lower().startswith('forwarded message'):
            break
        new_lines.append(line)
    return '\n'.join(new_lines).strip()

def main():
    service = get_service()
    results = service.users().messages().list(userId='me', maxResults=10).execute()
    messages = results.get('messages', [])
    for m in messages:
        msg = service.users().messages().get(userId='me', id=m['id'], format='full').execute()
        headers = msg['payload']['headers']
        subject = next(h['value'] for h in headers if h['name'] == 'Subject')
        parts = msg['payload'].get('parts', [])
        body = ""
        for part in parts:
            if part['mimeType'] == 'text/plain':
                data = part['body']['data']
                body = base64.urlsafe_b64decode(data).decode('utf-8')
                break
            elif part['mimeType'] == 'text/html':
                data = part['body']['data']
                html = base64.urlsafe_b64decode(data).decode('utf-8')
                body = BeautifulSoup(html, 'html.parser').get_text()
                break
        print("\n=== EMAIL ===")
        print("Subject:", subject)
        print("Cleaned Body:", clean_text(body)[:300])  # first 300 chars

if __name__ == '__main__':
    main()

Step 7: Running the script

When Zack ran:

1
python gmail_quickstart.py

Here’s what happened:

  • A browser window opened, asking him to log in with Google.
  • Google showed a warning: “This app isn’t verified.” He clicked Advanced > Continue (since it was his own app).
  • He signed in and granted read-only access.
  • The script saved a file token.json.

From now on, his script would use token.json automatically without asking again.

On the terminal, he saw:

1
2
3
=== EMAIL ===
Subject: Meeting update
Cleaned Body: Hi Zack, we need to move the project call to Thursday. Let me know if that works.

It worked! Zack’s script could now fetch and print his real emails.

Step 8: Cleaning email format

The first time Zack ran the script, one email looked like this:

1
2
3
4
5
6
7
8
9
10
11
12
Hi Zack,
Please confirm your availability.

Thanks,
Ali
--
Ali Khan
Senior Project Manager
XYZ Ltd.

> On Mon, Feb 5, Zack wrote:
> Sure, let’s do 3 PM.

Without cleaning, the AI assistant would waste time on:

  • Signatures (name, title, phone)
  • Old quoted replies (lines starting with >)

That’s why the clean_text() function removed signatures (--) and forwards. This made summaries sharper and more relevant.

Step 9: Exercise for practice

Now it’s your turn.

  • Run Zack’s script.
  • Fetch the latest 10 emails.
  • Print only:
    • Subject
    • Cleaned body (first 200 characters)

Example output:

1
2
3
=== EMAIL ===
Subject: Invoice Reminder
Cleaned Body: Hi, this is a reminder that your invoice is due tomorrow. Please complete payment at your earliest convenience.

If you see long footers or old replies, tweak the clean_text() function. Try removing lines with URLs or legal disclaimers.

Zack’s feedback

When Zack saw his own emails printed in the terminal, he felt a sense of progress.

“This is real now,” he said. “I’m not just playing with toy examples. My assistant can read my inbox.”

It also made him realize the importance of preprocessing. Without cleaning, the summaries would be noisy. With cleaning, the assistant could focus on the main point.

Conclusion

In this lesson, you walked with Zack as he:

  • Learned why Gmail uses OAuth2 for secure access
  • Set up a Google Cloud project and enabled Gmail API
  • Fetched the latest 10 emails with Python
  • Cleaned email text to remove signatures, forwards, and noise
  • Printed subjects and cleaned bodies

This is a big milestone. From here, Zack’s assistant can start producing daily digests of his inbox. In the next lesson, he’ll connect this directly to GPT-4o to generate summaries automatically.

Frequently Asked Questions

Google requires OAuth2 for security. It lets you log in through a browser and grant your script permission to read emails without sharing your password.

You create a Google Cloud project, enable Gmail API, set up an OAuth consent screen, and then download the client credentials. This file is needed for authentication.

Emails often contain signatures, forwards, and quoted replies. Cleaning removes this noise so your summaries focus only on the main message.

Yes. In the Gmail API call, change the maxResults value from 10 to any number up to 100. You can also paginate through results.

The script prints each email’s subject and a cleaned version of the body (first 200–300 characters). This helps confirm the cleaning step worked.

Still have questions?Contact our support team