Google webMail Filer for Thunderbird:
A better approach to Gmail Labels?


Last Updated:  Thursday, 2011.06.09

This page is out-dated and is only kept for reference. Update and download page is here.


Features

- Single stand-alone executable. No installation, no DLL. Own IMAP implementation.
- Requires .net framework 3.5.
- Runs only on Windows (xp or later), both 32/64 bit editions.
- Executable size below 66 KB.
- Handles large Thunderbird mail files.
- Optional IMAP verbose logging.
- Checks for Google mail quota above %85 and disables message upload, yet still performs required labeling, if any.
- Accounts for duplicates on your web mailbox; uploads only what is needed.
- Batch-process Thunderbird mail files.
- Scans for and detects Thunderbird mail folders.
- Can Cancel/Pause/Resume jobs when needed. For instance, when it is eating your bandwidth.

Tip: To process a single isolated file out of its normal location, create an empty text file in the same folder where the file is, then change its extension to .msf


Feedback

Send all you need to admincraft@admincraft.net


Version History

Version 1.1.6.0

+ On popular demand, removed default check for Native Thunderbird Files. Now works with all mbox files.

Version 1.1.5.0

+ Complying with generic mbox format to support non-Thunderbird truly generic mbox files.

Version 1.1.4.1

+ Added an option for advanced users to disable validation of detected mbox files, to force the tool work on them.

Version 1.1.4.0

- Some peculiar date headers were being missed. Fixed.

Version 1.1.3.0

- Complying with FREE Gmail message limit of 25MB. Was set at 50MB. Fixed.

Version 1.1.2.1

- Incomplete IMAP.Append implementation missing "Internal Date" caused improper display of message dates on Google. Fixed.
+ Removed the sticky tooltip that shows mbox file paths. Got really annoying and was no more needed!

Version 1.1.1.1

+ Program window is now sizable, allowing better file list visibility.
+ Few extra code lines for truly minor enhancements.

Version 1.1.1.0

- Some IMAP comands contained an extra CR causing unexpected failure.

Version 1.1.0.0

+ Now supports using message Tags as source for Labels *plus* folder names.
+ Hypothetically faster WMI search!
+ Added help messages with adequate explanations.

Version 1.0.3.1

+ Enhanced IMAP verbose-log readability.
+ Longer random names for on-disk messages to avoid possible overlap.

Version 1.0.3.0

- Fixed bug with exit while searching for mbox files.

Version 1.0.2.4

+ Now uses flushing that might speed IMAP communication!

Version 1.0.2.3

+ Very few cosmetics!

Version 1.0.2.1

+ Added a group-selector checkbox for mbox search results.

Version 1.0.2.0

- Dumped the buggy native .net file search for the benefit of WMI robust search. Handles cases where access denied errors caused mbox file search to stop without warning.
+ Added an update notifier.

Version 1.0.1.0

- Fixed minor bugs with read buffer and Override-Label validation.

Version 1.0.0.0

First public release.



The Need

Trivially, people get accustomed to mail clients. They use them with caution - at start - then they get used to dumping all sorts of email and - believe it - non email stuff into them until they garble to death. Mail clients; don't get me wrong!

One of my clients had this very issue, where almost all of his users were using Thunderbird for mail client and were having their very unique "filing" system that builds endless hierarchies of folders and subfolders in figures of many thousands into the poor Mozilla Thunderbird. This was actually not a luxury; in most cases work nature mandated this behavior. I need to say that Mozilla did an awesome job on Thunderbird that it kept up all time with this kind of use, of course with expected degradation in performance, but not to the limit that would have crippled users.

The problem popped up when my client wanted his users to migrate their usual daily mail work from the martyr Thunderbird to Google's web mail interface. The mail was there already, but not as users were accustomed to. Not with the killer trees! They now needed some way they could filter mail on Google's web interface, so they can "file" their messages the way they are used to, or at least to the closest way.

Google's own web mail interface has a search component with filters that can match criteria on searched mail and apply labels as desired. But, it is weak. Very weak, actually. With the level of personalized filing mail was filed within Thunderbird, there was no chance that search could help. Add to this, there were thousands of other messages that were sent through a different mail gateway, other than Google's SMTP, hence these messages were missing on the web. They needed uploading.

One option was Gmail Uploader. It is nice. But: interface is single-threaded and user-unfriendly (GUI deforms while working), does not support Thunderbird by default and does not recognize Thunderbird folder trees as it should.
Another option was to still use Gmail Uploader, this time after going the very long way of loading Thunderbird mailbox files into Netscape Communicator 4.x, then importing them from Outlook Express and finally using Gmail Uploader to work them all the way up to Gmail. Is crazy more than is tedious.

The final solution was to build a custom home-made application that would grab Thunderbird mbox files, parse them, create a user-defined label for them on Gmail, then upload messages found to Gmail. That was supposed to be easy for a regular programmer - my client had an in-house programmers team - but when that task was assigned to the in-house team, it kind of jumped over a cliff.

As it appeared, the team was so engaged in other stuff that the tool took about 1 month before they started working on it. Then, I had to push progress myself for the tool to come true. It has not, at least not the way it should have. While I was reviewing requirements matched with test versions of the tool, I decided I would climb the spirit of challenge and see what could I do with this case, what my own tool would look like, motivated by the fact that I already played-with/love things connected to IMAP.

So, I admit I started working on the tool 3 weeks ago, but that was on my own free time. I could only manage big leaps on precious weekends. I started out with a single mailbox processor in mind, and ended up with something I'm really proud of. A multi-mailbox scanner/processor that is more flexible and user-friendly than - I would dare to say - Gmail Uploader. I do not really know about code efficiency, but I would leave that to you. You can use it and/or study it and give me feedback. I would love feedback.
One thing I'm sure about my tool: It does what it says it does, and it does it well :)

The Requirements

As this was a challenge, some requirements were must-meet:
1- As fast as possible, since upload/IMAP interaction is slow enough.
2- Lightweight and memory-concise as possible, since mbox files could very well be in GBytes.
3- Informative as possible, since many decisions are taken in a single process.
4- Error-less as possible, since errors could occur on many levels in a single go.
5- No duplication in messages on the web.
6- Watch for Google limits on mailbox/single-message.
7- Label matched web messages, even when upload is not applicable (duplicates/over-quota).
8- Interface must be user-friendly. Responsive.

After taking all the above into consideration, I went for the implementation. I thought I could:
1- Design the app as a single mbox processor. (expanded later, causing many changes but not to the core logic)
2- Read mbox into a string then use String.Split() to parse.
3- Large files would be read in chunks (how big a chunk was the question!), splitting each to parse.
4- For each recognized message, match Message-ID; if not then X-UIDL and match that on the web.
5- Label/Upload-and-Label as required.
6- Take all precaution on all steps for Google limits and/or other error-sources.

I set a limit of 150 MB for whole file split. Larger files I thought could be read in chunks of 100MB, splitting each. The limit worked for the whole file split. Chunk-splitting, however, was total failure.

Mark, Mark Not!

So, I had to read large files in chunks without raising OutOfMemory exceptions. I thought of 2 techniques:
1- Use a FileStream, .Seek()'ing along its length like a bee to "mark" message boundaries (record them). Clear all used objects. Then, process each message individually, clearing after each of them. Sounded good, but was really cumbersome. Failed.The reason it was so difficult is the behavior of FileStream.Read(). It is non-blocking. If you request 15 bytes to read, for example, FileStream.Read() does not block code flow until it returns your 15 bytes. It only guarantees you one-byte-return. After that one byte, it could return with any byte count it may well please!
2- Use the standard technique that was used in the test versions I reviewed. FileStream, continuous READ, with String memory-storage. This is actually the only way it would ever work.

Having decided which technique to use, I focused on memory consumption and garbage collection. I managed - I think - to do that in a good way. At least was way better than the test versions I reviewed. Using this technique forces you to use really tight coding. You must pay attention to every object you use, keeping memory usage at minimum.

Implement!

As I finished the single mbox-processor, it had this logic:
1- Open FileStream for mbox file.
2- Continuously read the file into a small byte buffer.
3- Use StringBuilder with the byte buffer for in-memory String storage/split-parsing.
4- Process messages found on the fly. Store ones that need upload to disk, label others on the web.
5- Batch-upload on-disk messages. Clear everything afterwards to get ready for another new mbox file.

After I finished the single mbox version, I realized I needed to handle situations where a user might just want to batch-process many mbox files at once. With the single file version, a user has to wait until a file is done processing, before working on another. So, there were 2 other challenges. One, I have to recognize if a selected folder is really a Thunderbird folder. Two, I needed to give advanced users liberty to target isolated mbox files and still have the tool work on them.

The second condition actually worked for the first. You don't have to restrict users to Thunderbird mail folder, as many other tools do, yet, this frees your code from trying to parse existing Thunderbird config, if any! I decided I would just look for "signs" that indicate that a folder is a Thunderbird mail folder, or just an isolated Thunderbird mail folder. The most reliable indication is existing .msf files; index files that match mailbox file names. If these are present, then the tool should add all files in the same path to its list of "possible" mbox files, then give user choice to choose from them. I did not have to worry about wrong folders, since a file is never processed unless its content is verified as "mostly-valid" mbox file; in other words, a false-positive can only occur if a user intentionally wrote such file! No other software would.

And, yes, this means that if you are an advanced user, you do not need .msf index files. Just isolate the mbox files you want, and create a dummy .msf file; just an empty text file with modified extension into the same folder where you placed your mbox file!

The IMAP implementation is also simple; limited to only what the tool needs to do, but is as much capable to be expanded. I hope you enjoy using this tool, or at least find it useful. You are always welcome to send me your feedback and/or suggestions.

This article with its source code and files is licensed under The GNU General Public License (GPL)