Let's Talk Behavior Analysis
These days people are using more behavior analysis and other buzzwords to analyze how people interact with their site. While probably not the first, HotJar is a common and well known tool for tracking the behaviors of users. Common tools include heatmaps that track users mouse movements as well as conversion funnels to see the common flow of users before they either convert or drop off.
What's this have to do with automation? Many companies are also using behavior analysis to find cheaters, bots, or other automation tools. For instance, in video games if your anti-cheat software is beaten and the user is using a speedhack by recording x,y,z locations every period of time you can find players who have exceeded a reasonable speed or are outside of a reasonable boundary. You can isolate those that stand out from the norm and ban them. Additionally, this can also be used to detect bots that follow a very strict path. More information here.
So what do you do? Well, rather than a strict point to point system a mesh system is harder to detect. Many users farming an area will stick to one specific area. If you map out the whole area and place key points it becomes harder to detect. Rather than going from point A to B to C to D... you can get from point A to D via three different paths. And then maybe you go back to B this time instead of C. It adds some randomness. But, there are still ways to catch it as maybe you always stop and change direction at coord x,y,z. Most of these mesh systems still use key coord points rather than adding any randomness. So at the end of the day you're still doing something very repetitive.
But, these are just some of the tools available to catch botters. MMOs will deploy other things such as seeing whose using a fake keyboard to catch users who write AutoIt or AutoHotKey scripts. This has lead to false positives for people who use ADA software. So they likely fixed detection when that process is running. I feel this is also how they're catching some of the more recent fishbots as they use a simulated keyboard. Once again, I haven't botted in a while, but I used a modified version of a popular fishbot that faked a hardware keyboard. I avoided detection.
Behavior Analysis On The Web
Alright, who cares about video games right? I use selenium and wanna bot the web! There are many tools out there that do the same thing. And luckily they are generally open and tell you what they do and how they detect you. Let's talk about Sift for a minute. "Sift prevents fraud with industry-leading technology and expertise, an unrivaled global data network, and a commitment to building long-term partnerships with our customers." They're a group of ex-googlers who have developed some tracking software to find bots and other bad actors. They try to defeat fraud, bot created accounts, and a whole bunch of other things. Let's take a look at what they do.
The code to do their goodies.
<script type="text/javascript">
var _user_id = "";
var _session_id = "{SOME GUID}";
var _sift = window._sift = window._sift || [];
_sift.push(['_setAccount', "{CODE}"]);
_sift.push(['_setUserId', _user_id]);
_sift.push(['_setSessionId', _session_id]);
_sift.push(['_trackPageview']);
(function() {
function ls() {
var e = document.createElement('script');
e.src = 'https://cdn.siftscience.com/s.js';
document.body.appendChild(e);
}
if (window.attachEvent) {
window.attachEvent('onload', ls);
} else {
window.addEventListener('load', ls, false);
}
})();
If you take a look at s.js you'll see that they "Include code from https://github.com/Valve/fingerprintjs2". Our buddy and friend FingerprintJS which I've talked about in the past. So, step one to remain hidden from them is to obscure your fingerprint, which is easy enough.
What else is part of their 'Behavior Analysis'? Luckily, they're kind enough to tell you. For instance, here is how they analyze emails.
Repeat fraudsters will often create an army of email addresses by only tweaking a few characters in a username. Sift Science is able to identify this behavior and determine that jonathan123@fraud.com is probably related to jonathan124@fraud.com (and that he’s likely to be fraudulent).
Does the username contain a known name? We’ll check to see if the email and the billing information share similar names.
Is the email domain a known disposable one? Is it a free email address? Signals like these increase the likelihood that someone is a fraudster. Not to worry though – we’ll take care of all these checks.
They also have an eBook that goes into a few more details on the various things they check. Finally, you can always request a demo and they'll tell you directly what they do. You can also ask a variety of questions on how they'll stop a common problem you have and they'll explain how they do it. Then you just modify.
A lot of these software products also have documentation. This gives further hints into what they do. Key events for instance. Such as Creating an account, Logging In, Signing up, etc. Just look at what data they want and you can figure out what they're using for analysis.
I'm not going to highlight everything, but many companies do the same. Their "marketing material" is also a glimpse into what they do. And Sift isn't the only one who does this.
Appear Human
Even if you satisfy the above with reasonable data and even if you're hiding that you're a bot you still run the risk of getting flagged if you're not behaving human like. Does it take you 2 seconds from page load to hit sign up, fill out your username and password, and his submit? Yeah, you're going to get blocked.
Do you land on the site, slowly scroll down and hit sign up on the button below the fold? Seems reasonable. Do you actually mouse over to buttons when you click? Do you use a real and common user agent? Do you throttle your input? The average person types around 40 words per minute. Selenium's SendKeys is extremely fast. Slow it down.