Is that really a good idea?

I’ve been itching to write an app using the Dropbox API for quite a while and I’ve had an idea bouncing around in my head for well over a year. I started a new job back in February of 2010 and didn’t have Dropbox setup on my work computer (and I wasn’t sure I wanted to set it up) but there was a file I wanted to upload. I could have just downloaded it locally and then uploaded it using the Dropbox website, but then I thought to myself, Why can’t I just enter the path of the file into the Dropbox website and have Dropbox download it for me? In my typical OCD fashion I spent months coming up with a good name and design before ever working on a single line of code. As I started to actually work on it, I ran into a few hurdles which made me think it wasn’t such a good idea after all.

The first problem I ran into was that not every site sends the same HTTP headers with their file downloads. This made it difficult to be able to consistently get the information I wanted to store about the download. I ended up using PHP’s get_headers function to retrieve the file headers and parse out the filename, direct download link, file size, content type, etc. This seemed to work for most downloads, and for those that did not send a Content-Disposition header, I was able to extract the filename from the URL easily enough.

Then I tried to use a file download from MediaFire. None of the workarounds I had in place worked. I found a few posts on Stackoverflow about parsing the HTML/JS to get the direct download link, but even when I had the link I still wasn’t able to get the file correctly. Then I thought, does MediaFire have an API? Yes, yes they do. Great! I can parse the URL to look for MediaFire files and use their API to download them. But then what about all the other file hosting services? I haven’t done any testing with Rapidshare or other services, but I’ll admit, just the thought of having to develop a customized solution for each host makes me cringe.

I haven’t even touched on the logistics of queueing up downloads and sending them to Dropbox for however many people might find this useful. Then I have to consider the storage space required to host the downloads (or the memory to stream them) and the bandwidth to send them to Dropbox. If I were to continue with the project I would probably use Amazon’s cloud services to host the site. It would be a good learning opportunity as I’ve never worked with any of Amazon’s services before and I feel that this idea is perfect for AWS/S3 (or EB)/SWF/SQS.

Next up, my itch to use the Dropbox API in something pushes me to integrate it into using django-celery!

Is that really a good idea?

Leave a Reply

Your email address will not be published. Required fields are marked *