The Time When Sony Music Used Instagram Proxy API and Got Banned
Last couple of hours have been crazy and I am not even kidding! This is a worthy first post of 2018.
I hope you found my Instagram Proxy API project interesting. In a nutshell, it let's you access Instagram's public data programatically as a JSON CORS compliant API. However, this project started gaining traction after Instagram disabled access to it's public data. I implemented a workaround which works pretty stable and you can find more about it on the the github project page.
This post is not about how this API works, but the unusual amount of traffic, especially scraping requests I have been getting. Getting around most of the scraping requests was easy. I used the rate-limiter from cloudflare and then allowed access based on HTTP Referrer to make sure I was getting requests from actual websites. This is not fool-proof, but it kinda works. The real pain in the butt has been the commercial for-profit websites trying to use the public endpoints hosted at igpi.ga and igpi.ga on their website. Since the API is running on a free heroku instance (I did not want to pay to run a free service) I clearly mentioned on the project page that this will only be allowed to run on personal websites, blogs and portfolios. However I added a one-click deploy button for whoever was interested in running this on their own heroku account for whatever purpose they like.
At first it was a little inconvientient, I would browse the logs once a day and send out email notice to potential violaters. 24 hours later, would blacklist their websites from referring requests to my service. Then I lost interest in reading logs for a couple of weeks as I got busy with life. Yesterday I was curious to see how was everything working and I just scanned through the logs only to realize that the numbers were through the roof. I was getting requests from a couple of websites, rather frequently.
Three of these websites appeared frequently, ace-tee.com, selig.eu and danyiom.com, the first looked like a personal page of a music artist. Then I noticed that she was doing concerts and had like thousands of instagram followers. I kept scrolling down only to discover "Copyright © 2018, sonymusic.de, Sony Music Corporation.". At first, I was elated that a big company relied on my service for images, then I looked at the second website and found a similar thing, "Copyright © 2018, sonymusic.de, Sony Music Corporation." and again. I was overwhelmed, feeling happy and sad at the same time.
Imagine, you're a web developer working for one of the largest music corporations in the world. You've been asked to develop webpages of artists your corporation manages. The requirement clearly states that the page will recieve a huge amount of traffic, as these artists are socially active and people love to check them out. How will you approach this task? If it was me, I would be vetting all the resources my page will be loading the data from. Verify that if the services my page relies on, will even be able to serve the number of requests required to cater to my traffic needs (I'd even refrain loading from services like rawgit, as I am not sure it would meet the huge demand). Isn't this amature behaviour? The worst part is, I can supply malicious js code in response to these requests and redirect all traffic to the website of my choice (unethical, but possible!)
Looks like, whoever was developing the code did not give a rat's ass about ethics or requirements. Probably sony cheaped out and got these pages outsourced from a "economical" entity based somewhere in my homeland, India. It was troubling to learn that this happens more often than it should. I posted about this on reddit and got some interesting advice. I did the same I do with other non-compliant websites and sent out a 24-hour notice:
They did not respond, what did I expect? but I saw them take action, in less than 24-hours, they setup their own instance at https://sme-instagram-api.herokuapp.com/ and started using that to serve instagram traffic for their artist's pages. It was redundant, but I banned them anyways. I am happy that some important entity thought my service was worthy to be used on their pages. I am also happy they gave me chance to ban them from using my service. I am sad, that Sony Corp did not even bother to send out an apology for doing this, a simple acknowledged would have done, but they sneakly just changed their code as if nothing had happend.
I am now working on a whitelist only model, I am tired of maintaining a blacklist. I would now let users submit their site for approval on the project and once approved they will be able to use this api on their websites.
Last 24 hours were fun, I hope you enjoyed reading it :)