Sanitize Your Data

Was reading this article today about Genius filing a lawsuit against Google for violating Genius's Terms of Use.

The article served as a reminder why you should sanitize the data you scrape. This is a simple thing that Genius did to determine who was stealing their data. Apostrophes were either straight or curly. Had Google sanitized all the quotes to one or the other Genius would be none the wiser. But, I guess with the lawsuit being dismissed it doesn't' really matter.

A state court has dismissed a high-profile case showing unsportsmanlike conduct by Google, which was caught red-handed using lyrics obviously scraped from Genius. Unfortunately for the latter, the complaints amount to a copyright violation — which wasn’t what the plaintiffs alleged, sinking the case.

The lawsuit, filed in December, accused Google of violating Genius’s terms of use and unjustly enriching itself by scraping lyrics on the site to be displayed on searches for songs. So, for instance, someone searching for “Your Love is Killing Me lyrics” would be shown the lyrics immediately instead of being sent to a site like Genius that hosted them.

That’s fair play, except when the lyrics are taken directly from those sites (directly or via an accomplice) without permission or attribution — and Genius proved that Google was doing this by cleverly hiding “RED HANDED” inside lyrics, using Morse code formed from curly and straight apostrophes. Devious!

Caught thus, Google said it would mend its ways, and soon was caught again, doing the same thing using the same method. It’s certainly enough to make you want to see the big G take some licks, and Genius filed a lawsuit hoping to achieve just that.

The problem is this: Genius isn’t the copyright holder for these lyrics, it just licenses them itself. Its allegations against Google, Judge Margo Brodie of the Eastern District of New York determined, amount to copyright violations, in nature if not in name, and copyright is outside Brodie’s jurisdiction.

Plaintiff’s allegations that Defendants “scraped” and used their lyrics for profit amount to allegations that Defendants made unauthorized reproductions of Plaintiff’s lyric transcriptions and profited off of those unauthorized reproductions, which is behavior that falls under federal copyright law.

As to allegations of unfair business conduct, Brodie says those too are copyright disputes:

Plaintiff has not alleged that Defendants breached any fiduciary duty or confidential relationship, or that Defendants misappropriated Plaintiff’s trade secrets. Instead, Plaintiff’s claims are precisely the type of misappropriation claims that courts have consistently held are preempted by the Copyright Act.

Because all the causes for complaint are preempted by federal law, Brodie really has no choice but to kick the case out:

Given that the Court finds that all of Plaintiff’s state law claims are preempted by the Copyright Act, and Plaintiff has not asserted any federal law claims, the Court dismisses the Complaint for failure to state a claim.

It’s a bit disappointing, of course, to see a company like Google engage in shenanigans and get away with it (though let us not forget that Genius has engaged in some shenanigans of its own). But the legal system is all about crossing your t’s and dotting your i’s. If someone steals your wallet, you don’t accuse them of embezzlement, even though they’re kind of the same thing.

In this case Genius’s legal team needed to bring a copyright complaint, but possibly were unable to due to not being the copyright owners themselves. (Copyright law is notoriously obtuse, especially in questions of digital copies and licensing.)

Genius could file a new lawsuit or just cut their losses, having given Google a very public black eye; the scraping practice even got some play during the recent tech antitrust hearings in Congress. Certainly Google is on notice — but make no mistake, they’re popping champagne in Mountain View tonight.