Downloading Gmail attachments in PHP – an update

Over the last few years I’ve received quite some queries regarding the article Downloading Gmail attachments using PHP published here. After procrastinating for some time (a long time actually), I’ve answered some of the queries in this updated post.

As mentioned in the earlier post, automatically extracting attachments from Gmail can be important for reasons where you need to process the attached files periodically with a CRON job or need to process the files programmatically. Also it can be useful for automatically archiving important attachments.

Below is a simple proof-of-concept plain PHP code, devoid of any object-oriented features that extracts attachments from your Gmail account. It uses PHPs imap extension to access the inbox so make sure it is enabled in your php.ini. Before you proceed make sure that imap is also enabled in your Gmail settings page. All attachments downloaded in the following code are saved in the current folder, which you can easily change to point to another directory.

The complete code listing is given below. Downloadable code is given at the end of the post.

<?php
 
 
/**
 *	Gmail attachment extractor.
 *
 *	Downloads attachments from Gmail and saves it to a file.
 *	Uses PHP IMAP extension, so make sure it is enabled in your php.ini,
 *	extension=php_imap.dll
 *
 */
 
 
set_time_limit(3000); 
 
 
/* connect to gmail with your credentials */
$hostname = '{imap.gmail.com:993/imap/ssl}INBOX';
$username = 'YOUR_GMAIL_USERNAME'; # e.g somebody@gmail.com
$password = 'YOUR_GMAIL_PASSWORD';
 
 
/* try to connect */
$inbox = imap_open($hostname,$username,$password) or die('Cannot connect to Gmail: ' . imap_last_error());
 
 
/* get all new emails. If set to 'ALL' instead 
 * of 'NEW' retrieves all the emails, but can be 
 * resource intensive, so the following variable, 
 * $max_emails, puts the limit on the number of emails downloaded.
 * 
 */
$emails = imap_search($inbox,'ALL');
 
/* useful only if the above search is set to 'ALL' */
$max_emails = 16;
 
 
/* if any emails found, iterate through each email */
if($emails) {
 
    $count = 1;
 
    /* put the newest emails on top */
    rsort($emails);
 
    /* for every email... */
    foreach($emails as $email_number) 
    {
 
        /* get information specific to this email */
        $overview = imap_fetch_overview($inbox,$email_number,0);
 
        /* get mail message, not actually used here. 
           Refer to http://php.net/manual/en/function.imap-fetchbody.php
           for details on the third parameter.
         */
        $message = imap_fetchbody($inbox,$email_number,2);
 
        /* get mail structure */
        $structure = imap_fetchstructure($inbox, $email_number);
 
        $attachments = array();
 
        /* if any attachments found... */
        if(isset($structure->parts) && count($structure->parts)) 
        {
            for($i = 0; $i < count($structure->parts); $i++) 
            {
                $attachments[$i] = array(
                    'is_attachment' => false,
                    'filename' => '',
                    'name' => '',
                    'attachment' => ''
                );
 
                if($structure->parts[$i]->ifdparameters) 
                {
                    foreach($structure->parts[$i]->dparameters as $object) 
                    {
                        if(strtolower($object->attribute) == 'filename') 
                        {
                            $attachments[$i]['is_attachment'] = true;
                            $attachments[$i]['filename'] = $object->value;
                        }
                    }
                }
 
                if($structure->parts[$i]->ifparameters) 
                {
                    foreach($structure->parts[$i]->parameters as $object) 
                    {
                        if(strtolower($object->attribute) == 'name') 
                        {
                            $attachments[$i]['is_attachment'] = true;
                            $attachments[$i]['name'] = $object->value;
                        }
                    }
                }
 
                if($attachments[$i]['is_attachment']) 
                {
                    $attachments[$i]['attachment'] = imap_fetchbody($inbox, $email_number, $i+1);
 
                    /* 3 = BASE64 encoding */
                    if($structure->parts[$i]->encoding == 3) 
                    { 
                        $attachments[$i]['attachment'] = base64_decode($attachments[$i]['attachment']);
                    }
                    /* 4 = QUOTED-PRINTABLE encoding */
                    elseif($structure->parts[$i]->encoding == 4) 
                    { 
                        $attachments[$i]['attachment'] = quoted_printable_decode($attachments[$i]['attachment']);
                    }
                }
            }
        }
 
        /* iterate through each attachment and save it */
        foreach($attachments as $attachment)
        {
            if($attachment['is_attachment'] == 1)
            {
                $filename = $attachment['name'];
                if(empty($filename)) $filename = $attachment['filename'];
 
                if(empty($filename)) $filename = time() . ".dat";
 
                /* prefix the email number to the filename in case two emails
                 * have the attachment with the same file name.
                 */
                $fp = fopen("./" . $email_number . "-" . $filename, "w+");
                fwrite($fp, $attachment['attachment']);
                fclose($fp);
            }
 
        }
 
        if($count++ >= $max_emails) break;
    }
 
} 
 
/* close the connection */
imap_close($inbox);
 
echo "Done";
 
?>

Search by criteria

As mentioned in the code, we currently download attachments for ‘ALL’ the emails. The second parameter for the imap_search function specifies the criteria.

$emails = imap_search($inbox,'ALL');

However we may only want to download attachments containing certain subject text or mails from a certain person. We can accomplish this by specifying the search criteria to narrow down the emails. Some examples are given below.

/* Only read emails which has 'gmail attach' string in the subject field */
$emails = imap_search($inbox,'SUBJECT "gmail attach"');
 
/* Only read emails from a certain email id */
$emails = imap_search($inbox,'FROM "ted1234@yahoo.com"');

There are various criteria keywords that you can use and combine to narrow down your search. Some examples are shown below.

/* Only read emails which has 'gmail attach' string in the subject field 
   after 21st July 2014 
 */
$emails = imap_search($inbox,'SUBJECT "gmail attach" SINCE "21 July 2014"');
 
/* UNSEEN emails since 29th July 2014 */
$emails = imap_search($inbox, 'UNSEEN SINCE "29 July 2014"');
 
/* ALL emails since 29th July 2014 */
$emails = imap_search($inbox, 'ALL SINCE "29 July 2014"');

The complete list of criteria that can be combined is listed below. Take careful note of the quotes in the criteria string. A string is delimited by spaces, in which the following keywords are allowed. Any multi-word arguments (e.g. FROM “hello world”) must be quoted. Results will match all criteria entries.

  ALL - return all messages matching the rest of the criteria
  ANSWERED - match messages with the \\ANSWERED flag set
  BCC "string" - match messages with "string" in the Bcc: field
  BEFORE "date" - match messages with Date: before "date"
  BODY "string" - match messages with "string" in the body of the message
  CC "string" - match messages with "string" in the Cc: field
  DELETED - match deleted messages
  FLAGGED - match messages with the \\FLAGGED flag set
  FROM "string" - match messages with "string" in the From: field
  KEYWORD "string" - match messages with "string" as a keyword
  NEW - match new messages
  OLD - match old messages
  ON "date" - match messages with Date: matching "date"
  RECENT - match messages with the \\RECENT flag set
  SEEN - match messages that have been read (the \\SEEN flag is set)
  SINCE "date" - match messages with Date: after "date"
  SUBJECT "string" - match messages with "string" in the Subject:
  TEXT "string" - match messages with text "string"
  TO "string" - match messages with "string" in the To:
  UNANSWERED - match messages that have not been answered
  UNDELETED - match messages that are not deleted
  UNFLAGGED - match messages that are not flagged
  UNKEYWORD "string" - match messages that do not have the keyword "string"
  UNSEEN - match messages which have not been read yet

You can use any number of criteria to filter the emails returned, just make sure you quote the string text correctly and the criteria do not conflict one another.

$emails = imap_search($inbox, 'ALL SUBJECT "sale" SINCE "28 July 2014"');

IMAP2 search criteria is defined in RFC 1176, section “tag SEARCH search_criteria”. Refer to this document to know the detailed IMAP stuff.

Opening in READONLY mode

In the original code once the emails have been read the corresponding mails are marked as ‘read’ and will be shown as such in your gmail account. You can however specify that imap open the mails in ‘read only’ mode, so the emails are not marked as ‘read’.

$inbox = imap_open($hostname,$username,$password, OP_READONLY);

Reading mail headers

The imap_fetch_overview function reads an overview of the information in the headers of the given message.

/* get information specific to this email */
$overview = imap_fetch_overview($inbox,$email_number,0);

This returns the following array for a sample email. You can use the information to further reject or download a attachment depending on the given header information.

Array
(
    [0] => stdClass Object
        (
            [subject] => What is it like to be short? - Quora
            [from] => Quora Digest <digest-noreply@quora.com>
            [to] => metapix@gmail.com
            [date] => Tue, 15 Jul 2014 01:39:09 +0000 (UTC)
            [message_id] => <46asd.27f.3fe@ismtpd-005.iad1.sendgrid.net>
            [size] => 92951
            [uid] => 19783
            [msgno] => 22561
            [recent] => 0
            [flagged] => 0
            [answered] => 0
            [deleted] => 0
            [seen] => 0
            [draft] => 0
            [udate] => 6415338474
        )
 
)

You can now grab the subject of the email if required.

echo $overview[0]->subject;

A more extensive set of header information can be read using the imap_headerinfo function.

$headers = imap_headerinfo($inbox,$email_number);

Running the code on Ubuntu

The above code was tested both on Windows and Ubuntu 14.04 LTS. If you do not have PHP imap extension installed you can do it with the following.

sudo apt-get install php5-imap

However it’s not enabled by default so enable it with:

sudo php5enmod imap

If the above does not work, add the following line manually to ‘/etc/php5/apache2/php.ini’ in the extension section.

extension=imap.so

Then restart apache:

sudo service apache2 restart

Additional tips

The script by default saves the downloaded files in the current directory. However you can change the path to the required folder.

/* Save the downloaded in the 'attachments' folder. */
$fp = fopen("./attachments/" . $email_number . "-" . $filename, "w+");

Security changes

Note that accessing Gmail from PHP requires that IMAP is enabled in your Gmail account.

Goto your Gmail account and enable IMAP in your Gmail settings

1. Sign in to Gmail.
2. Click the gear in the top right .
3. Select Settings.
4. Click Forwarding and POP/IMAP.
5. Select Enable IMAP.
6. Click Save Changes.

Also enable ‘Access for less secure apps’ from the following url, note that this is an additional layer of security that Gmail provides, so after using the above PHP code disable access for apps again.

https://www.google.com/settings/security/lesssecureapps

Download Code file
Downloads : 5905 / File size : 1.5 kB


5 thoughts on “Downloading Gmail attachments in PHP – an update

  1. Nice little script. I was working on getting postscipt on a VPS to pipe to a bash script and then extract the attachment, and load it to a specific directory, but this is an excellent alternative. Thanks for writing this script.

  2. There seems to be a problem if the attachment file names contain spaces or special characters. Is there a simple way to fix this?

Comments are closed.