<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:media="http://search.yahoo.com/mrss/"><channel><title><![CDATA[Bots.lol]]></title><description><![CDATA[Short Reads About Bots, Scraping, Automation, Selenium, And Avoidance]]></description><link>https://bots.lol/</link><image><url>https://bots.lol/favicon.png</url><title>Bots.lol</title><link>https://bots.lol/</link></image><generator>Ghost 3.26</generator><lastBuildDate>Thu, 11 Sep 2025 14:04:43 GMT</lastBuildDate><atom:link href="https://bots.lol/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[How To Scrape Google Search Results Data In Python Easily]]></title><description><![CDATA[<p><strong>Google search engine results pages (SERPs)</strong> can provide alot of important data for you and your business but you most likely wouldn't want to scrape it manually. After all, there might be multiple queries you're interested in, and the corresponding results should be monitored on a regular basis. This is</p>]]></description><link>https://bots.lol/how-to-scrape-google-search-results-data-in-python-easily/</link><guid isPermaLink="false">65b272318d73d565bdfa7e18</guid><dc:creator><![CDATA[Y]]></dc:creator><pubDate>Thu, 25 Jan 2024 14:38:35 GMT</pubDate><content:encoded><![CDATA[<p><strong>Google search engine results pages (SERPs)</strong> can provide alot of important data for you and your business but you most likely wouldn't want to scrape it manually. After all, there might be multiple queries you're interested in, and the corresponding results should be monitored on a regular basis. This is where automated scraping comes into play: you write a script that processes the results for you or use a dedicated tool to do all the heavy lifting.</p><p>In this article you'll learn <strong>how to scrape Google search results with Python</strong>. We will discuss three main approaches:</p><ul><li>Using the Scrapingbee API to simplify the process and overcome anti-bot hurdles (hassle free)</li><li>Using a graphical interface to construct a scraping request (that is, without any coding)</li><li>Writing a custom script to do the job</li></ul><p>We will see multiple code samples to help you get started as fast as possible.</p><p>Shall we get started?</p><p><em>You can find the <a href="https://github.com/bodrovis-learning/Python-Google-scraping">source code for this tutorial on GitHub</a>.</em></p><h2 id="why-scrape-search-results"><strong>Why scrape search results?</strong></h2><p>The first question that might arise is "why in the world do I need to scrape anything?". That's a fair question, actually.</p><ul><li>You might be an SEO tool provider and need to track positions for billions of keywords</li><li>You might be a website owner and want to check your rankings for a list of keywords regularly.</li><li>You might want to perform competitor analysis. The simplest thing to do is to understand how your website ranks versus that other guy's website: in other words, you'll want to assess your competitor's positions for various keywords. .</li><li>Also, it might be important to understand what customers are into these days. What are they searching for? What are the modern trends?</li><li>If you're a content creator, it will be important for you to analyze potential topics to cover. What your audience would like to read about?</li><li>Perhaps, you might need to perform lead generation, monitor certain news, prices, or research and analyze a given field.</li></ul><p>In fact, as you can see, there are many reasons to scrape the search results. But while we understand "why", the more important question is "how" which is closely tied to "what are the potential issues". Let's talk about that.</p><h2 id="challenges-of-scraping-google-search-results"><strong>Challenges of scraping Google search results</strong></h2><p>Unfortunately, scraping Google search results is not as straightforward as one might think. Here are some typical issues you'll probably encounter:</p><h3 id="aren-t-you-a-robot-by-chance"><strong>Aren't you a robot, by chance?</strong></h3><p>I'm pretty sure I'm not a robot (mostly) but for some reason Google keeps asking me this question for years now. It seems he's never satisfied with my answer. If you've seen those nasty "I'm not a robot" checkboxes also known as "captcha" you know what I mean.</p><p>So-called "real humans" can pass these checks fairly easily but if we are talking about scraping scripts, things become much harder. Yes, you can think of a way to solve captchas but this is definitely not a trivial task. Moreover, if you fail the check multiple times your IP address might get blocked for a few hours which is even worse. Luckily, there's a way to overcome this problem as we'll see next.</p><h3 id="do-you-want-some-cookies"><strong>Do you want some cookies?</strong></h3><p>If you open Google search home page via your browser's incognito mode, chances are you're going to see a "consent" page asking whether you are willing to accept some cookies (no milk though). Until you click one of the buttons it won't be possible to perform any searches. As you can guess, the same thing might happen when running your scraping script. Actually, we will discuss this problem later in this article.</p><h3 id="don-t-request-so-much-from-me-"><strong>Don't request so much from me!</strong></h3><p>Another problem happens when you request too much data from Google, and it becomes really angry with you. It might happen when your script sends too many requests too fast, and consequently the service blocks you for a period of time. The simplest solution is to wait, or to use multiple IP addresses, or to limit the number of requests, or... perhaps there's some other way? We're going to find out soon enough!</p><h3 id="lost-in-data"><strong>Lost in data</strong></h3><p>Even if you manage to actually get some reasonable response from Google, don't celebrate yet. Problem is, the returned HTML data contains lots and lots of stuff that you are not really interested in. There are all kinds of scripts, headers, footers, extra markup, and so on and so forth. Your job is to try and fetch the relevant information from all this gibberish but it might appear to be a relatively complex task on its own.</p><p>Problem is, Google tends to use not-so-meaningful tag IDs due to certain reasons, therefore you can't even create reliable rules to search the content on the page. I mean, yesterday the necessary tag ID was <code>yhKl7D</code> (whatever that means) but today it's <code>klO98bn</code>. Go figure.</p>]]></content:encoded></item><item><title><![CDATA[Beating Google ReCaptcha and the funCaptcha using AWS Rekognition]]></title><description><![CDATA[<h1 id="project-voight-kampff">Project Voight-Kampff</h1><p>Originally found <a href="https://news.ycombinator.com/item?id=24272858">HERE</a>.</p><p>Beating Google's reCaptcha using AWS Rekognition. Part of project Touch-Captcha (두 터치). I did this because I cannot promote a better Captcha without first beating the industry standard.</p><p>Nothing special here. Credit goes to the ML researchers who developed the image classification technologies readily available</p>]]></description><link>https://bots.lol/beating-google-recaptcha-and-the-funcaptcha-using-aws-rekognition/</link><guid isPermaLink="false">5f4545fd8d73d565bdfa7dcf</guid><category><![CDATA[Article]]></category><dc:creator><![CDATA[Y]]></dc:creator><pubDate>Tue, 25 Aug 2020 17:11:10 GMT</pubDate><content:encoded><![CDATA[<h1 id="project-voight-kampff">Project Voight-Kampff</h1><p>Originally found <a href="https://news.ycombinator.com/item?id=24272858">HERE</a>.</p><p>Beating Google's reCaptcha using AWS Rekognition. Part of project Touch-Captcha (두 터치). I did this because I cannot promote a better Captcha without first beating the industry standard.</p><p>Nothing special here. Credit goes to the ML researchers who developed the image classification technologies readily available today, either via the Google Vision API or AWS Rekognition.</p><p>Voight-Kampff comes from the movie Blade Runner (1982). It is the test used by Blade Runners to tell a Replicant(synthetic human/android) from a human being.</p><p>I am doing this because: I like research and I want to get a PhD in Machine Thinking (the inverse of Machine Learning). Your contributions will help me focus solely on this work with minimal distractions from the outside world.</p><h3 id="you-will-need-">You will need:</h3><ul><li>Google GCP account (The virtual machines I use are hosted in GCP)</li><li>AWS account (Proxies)</li></ul><h3 id="pull-with">Pull with</h3><p>curl "<a href="https://raw.githubusercontent.com/pirates-of-silicon-hills/test/master/setup.sh" rel="nofollow">https://raw.githubusercontent.com/pirates-of-silicon-hills/test/master/setup.sh</a>" --output setup.sh</p><p>chmod u+r+x setup.sh</p><p>./setup.sh</p><h3 id="past-puzzles">Past Puzzles</h3><p>Every puzzle you see in my demonstration videos has been saved as an image with a unique name as identifier. You can download all the images here: <a href="https://drive.google.com/open?id=18b0HxyOsLP6AZMpF1-DNITrGvFBGkYND" rel="nofollow">https://drive.google.com/open?id=18b0HxyOsLP6AZMpF1-DNITrGvFBGkYND</a></p>]]></content:encoded></item><item><title><![CDATA[Good IP vs Bad IP?]]></title><description><![CDATA[<p>In the past I've mentioned "Good IPs" and "Bad IPs". So what makes an IP Bad? Well, it comes down to what are <em>other people</em> doing on that IP? If you're using a cheap/free/crappy VPN or Proxy chances are you're sharing that with bad actors. </p><h2 id="really-bad">Really Bad</h2><p>This</p>]]></description><link>https://bots.lol/good-ip-vs-bad-ip/</link><guid isPermaLink="false">5f28667d8d73d565bdfa7d1f</guid><dc:creator><![CDATA[Y]]></dc:creator><pubDate>Fri, 21 Aug 2020 16:00:00 GMT</pubDate><content:encoded><![CDATA[<p>In the past I've mentioned "Good IPs" and "Bad IPs". So what makes an IP Bad? Well, it comes down to what are <em>other people</em> doing on that IP? If you're using a cheap/free/crappy VPN or Proxy chances are you're sharing that with bad actors. </p><h2 id="really-bad">Really Bad</h2><p>This IP shows up on a number of blacklists. You're not able to perform a google search without getting a captcha from google. Your "United States" IP address actually returns a non-US country on a IP whois. Some sites will block this directly and prevent any pages from loading.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://bots.lol/content/images/2020/08/google-recaptcha.png" class="kg-image" alt><figcaption>Throw it away. Never use that IP again.</figcaption></figure><h2 id="bad">Bad</h2><p>You're not blacklisted. But, the location services are wonky. Sometimes an IP whois will return US results, but google and various websites will target you as somewhere else. One easy way to check is to see if you get ads in a Google search? Will your Google search results be in Vietnamese or some other language than you'd expect?</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://bots.lol/content/images/2020/08/bad2.PNG" class="kg-image" alt srcset="https://bots.lol/content/images/size/w600/2020/08/bad2.PNG 600w, https://bots.lol/content/images/2020/08/bad2.PNG 629w"><figcaption>Results for a "US" based IP.</figcaption></figure><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://bots.lol/content/images/2020/08/US.PNG" class="kg-image" alt srcset="https://bots.lol/content/images/size/w600/2020/08/US.PNG 600w, https://bots.lol/content/images/2020/08/US.PNG 713w"><figcaption>I don't think we're actually in Kansas.</figcaption></figure><p></p><h2 id="good">Good</h2><p>Your IP doesn't have bad traffic and it returns US results. Probably a good IP. Now, if it's a 'datacenter' IP you might still get extra attention. Most VPNs and cheap proxies use datacenters as they're cheap.</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://bots.lol/content/images/2020/08/Good.PNG" class="kg-image" alt srcset="https://bots.lol/content/images/size/w600/2020/08/Good.PNG 600w, https://bots.lol/content/images/2020/08/Good.PNG 634w"><figcaption>Ads! Yay.</figcaption></figure><h2 id="when-good-ips-go-bad">When Good IPs Go Bad</h2><p>Now, with a VPN IPs rotate and you may not get the same one ever again. Which is kind of the point. If you use a proxy that leases an IP to multiple people you could get it leased with some bad actors. Your good IP may not be good after a few days. It could deteriorate and eventually fall on a blacklist. </p><p>A note on datacenters. Datacenters are not <em>bad</em>, but they're also not <em>good</em>. Some people will straight up ban datacenter access or flag you. Other places do not care a ton and treat IPs as good until they've done something bad. This can be used to your advantage as some datacenters don't allow bad actors. A cheap VPS that you spin up and route traffic too could provide enough cover for your tasks. Especially because you know all the traffic taken from that IP. Yes, someone could have done something 'bad' and released it to the pool. But, less likely than a free/cheap proxy. Additionally, some studies have shown that 50% of bots come from datacenter traffic and the rest come from residential or organizational IPs. </p><h2 id="one-final-word">One Final Word</h2><p>A 'Good IP' on one site may be a 'Bad IP' to another site. You could share an IP with a bad actor whose scrapping Amazon prices. No one else will care about that IP but Amazon. Much of this depends on the organization size. Amazon generally does most their stuff in house. So if {bad actor} has been flagged on Amazon your actions on Amazon could be flagged as well. But, smaller sites generally share tools. If ExampleA.com and ExampleB.com use Anti-Scraping.com's services and {bad actor} gets flagged trying to scrape ExampleA.com and you wander over to ExampleB you'll get blocked as well. But, that IP <em>may be fine elsewhere</em>.</p>]]></content:encoded></item><item><title><![CDATA[Let's Talk Behavior Analysis]]></title><description><![CDATA[<p>These days people are using more behavior analysis and other buzzwords to analyze how people interact with their site. While probably not the first, <a href="https://www.hotjar.com/">HotJar </a>is a common and well known tool for tracking the behaviors of users. Common tools include heatmaps that track users mouse movements as well as</p>]]></description><link>https://bots.lol/lets-talk-hueristics/</link><guid isPermaLink="false">5f27f4198d73d565bdfa7bc9</guid><dc:creator><![CDATA[Y]]></dc:creator><pubDate>Fri, 14 Aug 2020 16:00:00 GMT</pubDate><content:encoded><![CDATA[<p>These days people are using more behavior analysis and other buzzwords to analyze how people interact with their site. While probably not the first, <a href="https://www.hotjar.com/">HotJar </a>is a common and well known tool for tracking the behaviors of users. Common tools include heatmaps that track users mouse movements as well as conversion funnels to see the common flow of users before they either convert or drop off.</p><p>What's this have to do with automation? Many companies are also using behavior analysis to find cheaters, bots, or other automation tools. For instance, in video games if your anti-cheat software is beaten and the user is using a speedhack by recording x,y,z locations every period of time you can find players who have exceeded a reasonable speed or are outside of a reasonable boundary. You can isolate those that stand out from the norm and ban them. Additionally, this can also be used to detect bots that follow a very strict path. <a href="https://sites.cs.ucsb.edu/~chris/research/doc/spmagazine09_gamebots.pdf">More information here</a>. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://bots.lol/content/images/2020/08/line.jpg" class="kg-image" alt srcset="https://bots.lol/content/images/size/w600/2020/08/line.jpg 600w, https://bots.lol/content/images/size/w1000/2020/08/line.jpg 1000w, https://bots.lol/content/images/2020/08/line.jpg 1274w" sizes="(min-width: 720px) 720px"><figcaption>Easy to follow, easy to track.</figcaption></figure><p>So what do you do? Well, rather than a strict point to point system a mesh system is harder to detect. Many users farming an area will stick to one specific area. If you map out the whole area and place key points it becomes harder to detect. Rather than going from point A to B to C to D... you can get from point A to D via three different paths. And then maybe you go back to B this time instead of C. It adds some randomness. But, there are still ways to catch it as maybe you <em>always </em>stop and change direction at coord x,y,z. Most of these mesh systems still use key coord points rather than adding any randomness. So at the end of the day you're still doing something very repetitive. </p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://bots.lol/content/images/2020/08/mesh.jpg" class="kg-image" alt srcset="https://bots.lol/content/images/size/w600/2020/08/mesh.jpg 600w, https://bots.lol/content/images/size/w1000/2020/08/mesh.jpg 1000w, https://bots.lol/content/images/2020/08/mesh.jpg 1280w" sizes="(min-width: 720px) 720px"><figcaption>An example of a mesh system. Lots of paths!</figcaption></figure><p>But, these are just some of the tools available to catch botters. MMOs will deploy other things such as <a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getcurrentinputmessagesource">seeing whose using a fake keyboard</a> to catch users who write AutoIt or AutoHotKey scripts. This has lead to <a href="https://www.reddit.com/r/wow/comments/czterb/i_was_wrongfully_banned_from_world_of_warcraft/">false positives for people who use ADA software</a>. So they likely fixed detection when that process is running. I feel this is also how they're catching some of the more recent fishbots as they use a simulated keyboard. Once again, I haven't botted in a while, but I used a modified version of a popular fishbot that faked a hardware keyboard. I avoided detection.</p><h2 id="behavior-analysis-on-the-web">Behavior Analysis On The Web</h2><p>Alright, who cares about video games right? I use selenium and wanna bot the web! There are many tools out there that do the same thing. And luckily they are generally open and tell you what they do and how they detect you. Let's talk about <a href="https://bots.lol/lets-talk-hueristics/sift.com">Sift </a>for a minute. "Sift prevents fraud with industry-leading technology and expertise, an unrivaled global data network, and a commitment to building long-term partnerships with our customers." They're a group of ex-googlers who have developed some tracking software to find bots and other bad actors. They try to defeat fraud, bot created accounts, and a <a href="https://sift.com/products/digital-trust-safety-suite">whole bunch of other things.</a> Let's take a look at what they do.</p><!--kg-card-begin: markdown--><p>The code to do their goodies.</p>
<pre><code>&lt;script type=&quot;text/javascript&quot;&gt;
        var _user_id = &quot;&quot;;
        var _session_id = &quot;{SOME GUID}&quot;;

        var _sift = window._sift = window._sift || [];
        _sift.push(['_setAccount', &quot;{CODE}&quot;]);
        _sift.push(['_setUserId', _user_id]);
        _sift.push(['_setSessionId', _session_id]);
        _sift.push(['_trackPageview']);

       (function() {
         function ls() {
           var e = document.createElement('script');
           e.src = 'https://cdn.siftscience.com/s.js';
           document.body.appendChild(e);
         }
         if (window.attachEvent) {
           window.attachEvent('onload', ls);
         } else {
           window.addEventListener('load', ls, false);
         }
       })();
</code></pre>
<!--kg-card-end: markdown--><p>If you take a look at s.js you'll see that they "Include code from https://github.com/Valve/fingerprintjs2". Our buddy and friend <a href="https://fingerprintjs.com/demo">FingerprintJS</a> which I've talked about in the past. So, step one to remain hidden from them is to obscure your fingerprint, which is easy enough.</p><p>What else is part of their 'Behavior Analysis'? Luckily, they're kind enough to tell you. For instance, here is how they <a href="https://blog.sift.com/2015/feature-spotlight-the-hidden-clues-in-an-email-address/">analyze emails</a>. </p><blockquote>Repeat fraudsters will often create an army of email addresses by only tweaking a few characters in a username. Sift Science is able to identify this behavior and determine that <a href="mailto:jonathan123@fraud.com">jonathan123@fraud.com</a> is probably related to <a href="mailto:jonathan124@fraud.com">jonathan124@fraud.com</a> (and that he’s likely to be fraudulent).</blockquote><blockquote>Does the username contain a known name? We’ll check to see if the email and the billing information share similar names.</blockquote><blockquote>Is the email domain a known disposable one? Is it a free email address? Signals like these increase the likelihood that someone is a fraudster. Not to worry though – we’ll take care of all these checks.</blockquote><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://bots.lol/content/images/2020/08/email.PNG" class="kg-image" alt srcset="https://bots.lol/content/images/size/w600/2020/08/email.PNG 600w, https://bots.lol/content/images/size/w1000/2020/08/email.PNG 1000w, https://bots.lol/content/images/2020/08/email.PNG 1391w" sizes="(min-width: 720px) 720px"><figcaption>More hints as to what they check.</figcaption></figure><p>They <a href="https://pages.sift.com/rs/526-PCC-974/images/ebook-how-sift-works.pdf">also have an eBook</a> that goes into a few more details on the various things they check. Finally, you can always request a demo and they'll tell you directly what they do. You can also ask a variety of questions on <em>how they'll stop a common problem you have</em> and they'll explain <em>how they do it</em>. Then you just modify.</p><p>A lot of these software products also have documentation. This gives further hints into what they do. <a href="https://sift.com/developers/docs/curl/events-api/reserved-events/add-item-to-cart">Key events for instance</a>. Such as Creating an account, Logging In, Signing up, etc. Just look at what data they want and you can figure out what they're using for analysis. </p><p> I'm not going to highlight everything, but many companies do the same. Their "marketing material" is also a glimpse into what they do. And Sift isn't the only one who does this. </p><h2 id="appear-human">Appear Human</h2><p>Even if you satisfy the above with <em>reasonable data </em>and even if you're hiding that you're a bot you still run the risk of getting flagged if you're not behaving human like. Does it take you 2 seconds from page load to hit sign up, fill out your username and password, and his submit? Yeah, you're going to get blocked.</p><p>Do you land on the site, slowly scroll down and hit sign up on the button <em>below the fold</em>? Seems reasonable. Do you actually mouse over to buttons when you click? Do you use a real and common user agent? Do you throttle your input? The average person types around 40 words per minute. Selenium's SendKeys is extremely fast. Slow it down. </p>]]></content:encoded></item><item><title><![CDATA[Sanitize Your Data]]></title><description><![CDATA[<p>Was reading <a href="https://techcrunch.com/2020/08/11/court-dismisses-genius-lawsuit-over-lyrics-scraping-by-google/">this article</a> today about Genius filing a lawsuit against Google for violating Genius's Terms of Use.</p><p>The article served as a reminder why you should sanitize the data you scrape. This is a simple thing that Genius did to determine who was stealing their data. Apostrophes were either</p>]]></description><link>https://bots.lol/sanitize-your-data/</link><guid isPermaLink="false">5f34150f8d73d565bdfa7daf</guid><category><![CDATA[Article]]></category><dc:creator><![CDATA[Y]]></dc:creator><pubDate>Wed, 12 Aug 2020 16:16:48 GMT</pubDate><content:encoded><![CDATA[<p>Was reading <a href="https://techcrunch.com/2020/08/11/court-dismisses-genius-lawsuit-over-lyrics-scraping-by-google/">this article</a> today about Genius filing a lawsuit against Google for violating Genius's Terms of Use.</p><p>The article served as a reminder why you should sanitize the data you scrape. This is a simple thing that Genius did to determine who was stealing their data. Apostrophes were either straight or curly. Had Google sanitized all the quotes to one or the other Genius would be none the wiser. But, I guess with the lawsuit being dismissed it doesn't' really matter.</p><figure class="kg-card kg-image-card"><img src="https://bots.lol/content/images/2020/08/red-handed.png" class="kg-image" alt srcset="https://bots.lol/content/images/size/w600/2020/08/red-handed.png 600w, https://bots.lol/content/images/2020/08/red-handed.png 815w" sizes="(min-width: 720px) 720px"></figure><p>A state court has dismissed a high-profile case showing unsportsmanlike conduct by <strong>Google, </strong> which was caught red-handed using lyrics obviously scraped from Genius. Unfortunately for the latter, the complaints amount to a copyright violation — which wasn’t what the plaintiffs alleged, sinking the case.</p><p>The lawsuit, filed in December, accused Google of violating Genius’s terms of use and unjustly enriching itself by scraping lyrics on the site to be displayed on searches for songs. So, for instance, someone searching for “Your Love is Killing Me lyrics” would be shown the lyrics immediately instead of being sent to a site like <strong>Genius </strong> that hosted them.</p><p>That’s fair play, except when the lyrics are taken directly from those sites (directly or via an accomplice) without permission or attribution — and Genius proved that Google was doing this by cleverly hiding “RED HANDED” inside lyrics, using Morse code formed from curly and straight apostrophes. Devious!</p><p>Caught thus, Google said it would mend its ways, and soon was caught again, doing the same thing using the same method. It’s certainly enough to make you want to see the big G take some licks, and Genius filed a lawsuit hoping to achieve just that.</p><p>The problem is this: Genius isn’t the copyright holder for these lyrics, it just licenses them itself. Its allegations against Google, Judge Margo Brodie of the Eastern District of New York determined, amount to copyright violations, in nature if not in name, and copyright is outside Brodie’s jurisdiction.</p><blockquote>Plaintiff’s allegations that Defendants “scraped” and used their lyrics for profit amount to allegations that Defendants made unauthorized reproductions of Plaintiff’s lyric transcriptions and profited off of those unauthorized reproductions, which is behavior that falls under federal copyright law.</blockquote><p>As to allegations of unfair business conduct, Brodie says those too are copyright disputes:</p><blockquote>Plaintiff has not alleged that Defendants breached any fiduciary duty or confidential relationship, or that Defendants misappropriated Plaintiff’s trade secrets. Instead, Plaintiff’s claims are precisely the type of misappropriation claims that courts have consistently held are preempted by the Copyright Act.</blockquote><p>Because all the causes for complaint are preempted by federal law, Brodie really has no choice but to kick the case out:</p><blockquote>Given that the Court finds that all of Plaintiff’s state law claims are preempted by the Copyright Act, and Plaintiff has not asserted any federal law claims, the Court dismisses the Complaint for failure to state a claim.</blockquote><p>It’s a bit disappointing, of course, to see a company like Google engage in shenanigans and get away with it (though let us not forget that Genius has engaged in some shenanigans of its own). But the legal system is all about crossing your t’s and dotting your i’s. If someone steals your wallet, you don’t accuse them of embezzlement, even though they’re<em> kind of</em> the same thing.</p><p>In this case Genius’s legal team needed to bring a copyright complaint, but possibly were unable to due to not being the copyright owners themselves. (Copyright law is notoriously obtuse, especially in questions of digital copies and licensing.)</p><p>Genius could file a new lawsuit or just cut their losses, having given Google a very public black eye; the scraping practice even got some play during the recent tech antitrust hearings in Congress. Certainly Google is on notice — but make no mistake, they’re popping champagne in Mountain View tonight.</p><p><br></p>]]></content:encoded></item><item><title><![CDATA[But It's Dumb]]></title><description><![CDATA[<p>Some bots are dumb bots and that's okay! I no longer play Rainbow 6 Siege, but when I did I wanted the cool skins, guns, and all the operators. As you played you got an in game currency called 'Renown'. Well, you had to play to get this currency or</p>]]></description><link>https://bots.lol/but-its-dumb/</link><guid isPermaLink="false">5f1b3c608d73d565bdfa7b98</guid><dc:creator><![CDATA[Y]]></dc:creator><pubDate>Fri, 07 Aug 2020 16:00:00 GMT</pubDate><content:encoded><![CDATA[<p>Some bots are dumb bots and that's okay! I no longer play Rainbow 6 Siege, but when I did I wanted the cool skins, guns, and all the operators. As you played you got an in game currency called 'Renown'. Well, you had to play to get this currency or you could pay money for 'R6 Credits'. As a busy individual I didn't get a chance to play that much. I was also pretty shit at the game. So what did I do? Automation!</p><p>You could start a bot game called Terrorist Hunt. I believe the Protect Asset mode. If you sit there and lose? Well, you get the currency. Not much. But, you got some. Do it for a few hours and you could buy whatever you wanted. So, I created a simple AutoIt script to create a game, wait for death, close game. Repeat. Leaving it running overnight and I'd wake up to a bunch of currency. </p><p>"But you lose." "Your stats suck." Who cares? I got to unlock everything I wanted much faster than those who 'Got Gud'. </p><p>You need to remember that not every bot needs to be awesome. It just needs to get the <em>job done</em>.</p>]]></content:encoded></item><item><title><![CDATA[Instacart shoppers besieged by bots that snatch lucrative orders]]></title><description><![CDATA[<figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.seattletimes.com/business/instacart-shoppers-besieged-by-bots-that-snatch-lucrative-orders/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Instacart shoppers besieged by bots that snatch lucrative orders</div><div class="kg-bookmark-description">While bots aren’t a new problem for Instacart, the recent deluge is different because it comes at a time of white-knuckled expansion for the startup.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.seattletimes.com/apple-touch-icon.png?v=7kovnr5xE4"><span class="kg-bookmark-author">Kartikay Mehrotra</span><span class="kg-bookmark-publisher">The Seattle Times</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://static.seattletimes.com/wp-content/uploads/2020/07/07312020_TZR-Instacart_tzr_203021-375x241.jpg"></div></a></figure><p>Lisa Marsh’s job shopping and delivering groceries for Instacart</p>]]></description><link>https://bots.lol/instacart-shoppers-besieged-by-bots-that-snatch-lucrative-orders/</link><guid isPermaLink="false">5f2ab5108d73d565bdfa7da0</guid><category><![CDATA[Article]]></category><dc:creator><![CDATA[Y]]></dc:creator><pubDate>Wed, 05 Aug 2020 13:34:21 GMT</pubDate><content:encoded><![CDATA[<figure class="kg-card kg-bookmark-card"><a class="kg-bookmark-container" href="https://www.seattletimes.com/business/instacart-shoppers-besieged-by-bots-that-snatch-lucrative-orders/"><div class="kg-bookmark-content"><div class="kg-bookmark-title">Instacart shoppers besieged by bots that snatch lucrative orders</div><div class="kg-bookmark-description">While bots aren’t a new problem for Instacart, the recent deluge is different because it comes at a time of white-knuckled expansion for the startup.</div><div class="kg-bookmark-metadata"><img class="kg-bookmark-icon" src="https://www.seattletimes.com/apple-touch-icon.png?v=7kovnr5xE4"><span class="kg-bookmark-author">Kartikay Mehrotra</span><span class="kg-bookmark-publisher">The Seattle Times</span></div></div><div class="kg-bookmark-thumbnail"><img src="https://static.seattletimes.com/wp-content/uploads/2020/07/07312020_TZR-Instacart_tzr_203021-375x241.jpg"></div></a></figure><p>Lisa Marsh’s job shopping and delivering groceries for Instacart during the past three years has been unforgiving. Company tipping policies cut into earnings while boycotts and other labor strife created confusion, she said.</p><p>Then the global pandemic hit, transforming once mundane trips to Los Angeles grocery stores where she lives into a palpable health risk.</p><p>In recent weeks, another problem has emerged: bots that snatch the largest, most lucrative orders out of the hands of other shoppers.</p><p>Here’s how it works. Instacart pays contract workers to shop for groceries and deliver them to customers. Normally, the shoppers open the Instacart shopping app and, as orders flash by, click on the ones they want to fulfill. But in order to gain an edge, some shoppers are paying software developers who have created bots — in the form of third-party apps — that run alongside the legitimate Instacart app and claim the best orders for clients.</p><p>In this way, the app tilts competition between shoppers but is invisible to customers and doesn’t take business away from Instacart either. The cost of the third-party apps ranges from $250 to $600 in cryptocurrency or bank deposits, according to the darkweb research firm, DarkOwl.</p><p>When Marsh opens her Instacart shopping app, she sees promising orders disappear before she can act. “No human can click that fast,” she said. “Instacart needs to fix this. These bots are literally taking the food off my kids’ table.”</p><p>While bots aren’t a new problem for Instacart, the recent deluge is different because it comes at a time of white-knuckled expansion for the San Francisco-based startup. The company said customer demand for grocery delivery has surged more than 500% during the pandemic, notching growth its investors didn’t expect until 2025. This makes the platform, which hasn’t expanded its team as fast as its revenue, an attractive target for hustlers.</p><p>A spokeswoman for Instacart said the bots affect just a sliver of its more than 500,000 shoppers and that the company has already taken measures to address the issue.</p><p>“We take the integrity of the Instacart platform very seriously and have a trust and security team dedicated to monitoring the unauthorized use of the platform which includes all efforts to prevent illicit and fraudulent third-party apps from violating our terms of service,” said Natalia Montalvo, Instacart’s director of shopper engagement and communications.</p><p>Instacart said it’s combating bots by cranking up pressure against app makers and banning violators when they find them. The company said it deactivated 150 shoppers found to be misusing the platform and shut down half a dozen sites claiming to sell batches to Instacart shoppers including Instashopper.app, Sushopper, Ninja Hours and Acrobatshopper.</p><p>The developers of those apps couldn’t be located for comment.</p><p>Instacart also recently introduced new procedures such as prompting shoppers to verify their identity with a selfie and not permitting shoppers to switch devices in the middle of an order. Shoppers using the updated app can also choose to review a single order for 30 seconds before claiming it or passing it to another shopper.</p><p>“As a result of these measures, we’ve seen a dramatic reduction in the use of unauthorized third-party apps because of the hard work and dedication by our security and legal teams to protect the shopper experience,” Montalvo said. Instacart also last month enlisted the help of security platform HackerOne to battle bots by offering a bounty program, she said.</p><p>But as security experts at Amazon.com and other sites have discovered, battling rogue apps is a lot like playing whack-a-mole. As soon as a company thwarts one bot program, a new version of it emerges, usually with a new name.</p><p>“If Instacart cared — if it was losing money — they could devote resources to make the jobs of these automatic snipers much harder,” said Bruce Schneier, a cybersecurity expert, author and lecturer at Harvard University, adding that there are ways for companies to detect such bots. “This is a problem that any company that makes money from automation is likely being forced to deal with. Some handle it well. Others don’t.”</p><p>In recent months, different Instacart shopper-related apps have come and gone, sometimes using slightly varied titles, such as Ninja Hours, Ninja Shoppers and Ninja Shopper. DarkOwl discovered nearly a dozen active platforms in mid-May advertising openly on YouTube and social media platforms, including Reddit.</p><p>Digital breadcrumbs linked these sites back to users spanning the U.S., including New York; Savannah, Georgia; and Northern California’s wine country, according to DarkOwl. Others linked to an apparent Brazilian app developer syndicate that leans heavily on YouTube ads narrated in Portuguese, the research firm concluded.</p><p>The developer of those apps couldn’t be located for comment.</p><p>Some of the apps work, others are scams, according to DarkOwl. The Bitcoin wallet linked to the site of Ninja Shoppers indicates its owners have received 76 deposits — about $20,000 — including many from Instacart shoppers desperate to jumpstart their stalled shopping careers.</p><p>The apps are typically available on websites published by their developers. In the case of Ninja Shoppers, the app is free to download, but users must be ”activated in a private group” in order to be granted permission to pay for a user authentication token, according to their website, which is published in English and Portuguese. Once logged-in, the program prompts the user to find Instacart sales available near their location, according to a YouTube video viewed more than 13,000 times since May 9.</p><p>Despite Instacart’s efforts to crack down, finding a permanent solution may be difficult. Last month, one man using the Instacart shopping app, who said he’s been using a bot since March, offered to install it on another shopper’s phone for $250, plus a $130 weekly recurring fee, according to screen shots of a conversation in late July seen by Bloomberg.</p><p>When reached by phone, the man spoke first in Portuguese and then in English, confirming to Bloomberg he was selling a bot for those amounts. He declined to answer additional questions after learning that the information would likely be publicized.</p><p>Fear of getting deactivated or scammed out of money has stopped some shoppers from spending money on the apps. Others like Santa Cruz-area grandmother Ginger Colgate said she refuses to do so on moral grounds.</p><p>“It’s just not right. It’s against the rules,” said Colgate, complaining that her earnings dropped from $1,800 a week to $300 because the bots have siphoned the best work. Colgate said she still sometimes drives to Costco and opens the Instacart app, hoping for work.</p><p>“So many times I sit with tears in my eyes in the parking lot just waiting and hoping to get an order,” she said. “I’ve basically given up.”</p>]]></content:encoded></item><item><title><![CDATA[Are You a Bot?]]></title><description><![CDATA[<p>Let's talk Captchas for a second. They suck even when you're human. Use a VPN? Captcha. Privacy plugins? Captcha. Botting, scrapping, etc? Oh yeah. So when are you getting a captcha? Well I recently came across this <a href="https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php">handy dandy website</a> which gives you a captcha score. Normally, for an upstanding</p>]]></description><link>https://bots.lol/are-you-a-bot/</link><guid isPermaLink="false">5f1b23f78d73d565bdfa7a83</guid><dc:creator><![CDATA[Y]]></dc:creator><pubDate>Fri, 31 Jul 2020 16:00:00 GMT</pubDate><content:encoded><![CDATA[<p>Let's talk Captchas for a second. They suck even when you're human. Use a VPN? Captcha. Privacy plugins? Captcha. Botting, scrapping, etc? Oh yeah. So when are you getting a captcha? Well I recently came across this <a href="https://recaptcha-demo.appspot.com/recaptcha-v3-request-scores.php">handy dandy website</a> which gives you a captcha score. Normally, for an upstanding citizen you're looking at a score between 0.7-1. But, as long as you are not under the websites threshold it doesn't really matter.</p><!--kg-card-begin: markdown--><pre><code class="language-{">  &quot;success&quot;: true,    
  &quot;hostname&quot;: &quot;recaptcha-demo.appspot.com&quot;,    
  &quot;challenge_ts&quot;: &quot;2020-07-24T__:__:__Z&quot;,    
  &quot;apk_package_name&quot;: null,    
  &quot;score&quot;: 0.7,    
  &quot;action&quot;: &quot;examples/v3scores&quot;,    
  &quot;error-codes&quot;: []    
}</code></pre>
<!--kg-card-end: markdown--><h2 id="let-s-load-selenium">Let's Load Selenium</h2><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://bots.lol/content/images/2020/07/bad-selenium.PNG" class="kg-image" alt><figcaption>Ouch!</figcaption></figure><p>OUCH. As you can see a 0.1 is pretty bad and doesn't meet the threshold. You'll definitely get hit by a captcha for whatever you're doing. Let's do a few things to make selenium seem a bit friendlier. </p><!--kg-card-begin: markdown--><p>First of all.... Let's tell WebDriver to not fucking tell the world you're a bot.</p>
<pre><code class="language-c">driver.ExecuteChromeCommand(&quot;Page.addScriptToEvaluateOnNewDocument&quot;,
    new System.Collections.Generic.Dictionary&lt;string, object&gt; {
            { &quot;source&quot;, @&quot;
                Object.defineProperty(navigator, 'webdriver', {
                    get: () =&gt; undefined
                })&quot;
            }
        }
    );
</code></pre>
<!--kg-card-end: markdown--><figure class="kg-card kg-image-card"><img src="https://bots.lol/content/images/2020/07/42069-1.png" class="kg-image" alt></figure><!--kg-card-begin: markdown--><p>Thanks WebDriver! 🙄</p>
<ol>
<li>Browse a friendly site and get a couple cookies. Maybe a minute?</li>
<li>Search something from a 'good' IP, aka don't get flagged for your shitty free proxy/vpn.</li>
<li>Don't go directly there. Use the website's flow. Site Home -&gt; Link. Better yet. Google -&gt; Site Home -&gt; Link. Bonus points if they reliabily have an ad!</li>
</ol>
<p>Let's see how we did?</p>
<!--kg-card-end: markdown--><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://bots.lol/content/images/2020/07/selenium-yay.PNG" class="kg-image" alt><figcaption>2 minutes of human browsing!</figcaption></figure><p>Look at that. A respectable 0.9, no one is flagging our actions! And as you can see from the timestamp it was roughly 10 min for me to code a quick 'action routine' to be a good upstanding internet citizen, run the code, and hit the target page. </p><h2></h2>]]></content:encoded></item><item><title><![CDATA[Privacy? Not Here.]]></title><description><![CDATA[<p>For a while I've been using a browser extension that returns a fake canvas value to websites that request it. This ensures fewer sites are able to track me, especially ones that use something like <a href="https://fingerprintjs.com/demo">FingerprintJS</a>, which is able to track the same browser whether you're in incognito or if</p>]]></description><link>https://bots.lol/privacy-not-here/</link><guid isPermaLink="false">5f1b17648d73d565bdfa7a3f</guid><dc:creator><![CDATA[Y]]></dc:creator><pubDate>Fri, 24 Jul 2020 18:06:09 GMT</pubDate><content:encoded><![CDATA[<p>For a while I've been using a browser extension that returns a fake canvas value to websites that request it. This ensures fewer sites are able to track me, especially ones that use something like <a href="https://fingerprintjs.com/demo">FingerprintJS</a>, which is able to track the same browser whether you're in incognito or if you're using a VPN. It is also less disruptive than something like noscript as it allows some javascript. Unfortunately, there are a few 'anti-scraping' tools, such as Distil Networks, that now flag me as a bot. For instance on the review website, <a href="https://g2.com/">G2.</a> </p><figure class="kg-card kg-image-card"><img src="https://bots.lol/content/images/2020/07/Distil.PNG" class="kg-image" alt srcset="https://bots.lol/content/images/size/w600/2020/07/Distil.PNG 600w, https://bots.lol/content/images/size/w1000/2020/07/Distil.PNG 1000w, https://bots.lol/content/images/2020/07/Distil.PNG 1177w" sizes="(min-width: 720px) 720px"></figure><!--kg-card-begin: markdown--><ol>
<li>Nope, I landed on that page from a google search</li>
<li>Nope, I still allow JS and cookies.</li>
<li>Nope, not using Ghostery or NoScript.</li>
</ol>
<p>But, I am returning a fake value for their canvas fingerprint requests. I guess they check to validate it is a resonable fingerprint. Turning it off and the site loads.</p>
<!--kg-card-end: markdown--><p>TODO: Create an extension that does a better job faking the canvas fingerprint. </p><p>"Why not just use Brave?! They block fingerprints!"</p><figure class="kg-card kg-image-card kg-card-hascaption"><img src="https://bots.lol/content/images/2020/07/Fingerprint.PNG" class="kg-image" alt srcset="https://bots.lol/content/images/size/w600/2020/07/Fingerprint.PNG 600w, https://bots.lol/content/images/size/w1000/2020/07/Fingerprint.PNG 1000w, https://bots.lol/content/images/2020/07/Fingerprint.PNG 1231w" sizes="(min-width: 720px) 720px"><figcaption>Turning on 'strict'. Closing and opening private browsing and you're still tracked.</figcaption></figure><p>Distil is used by a fair number of websites. So if you want to browse them you'll need to let them track you. There are a few other things you can do to avoid them, but you still got to let them track you. But, that is a later post.</p><p>PS. If you've got an extension that obfuscates canvas but still passes Distil lemme know! Author Name @ Domain. 😘</p>]]></content:encoded></item></channel></rss>