They generally do, because very few people bother to block them anyway. Complying with robots.txt is a good way to show in court that you did put effort into complying with the usual standards, without it ever impacting the useful information you’re scraping.
At the executive level, no I don’t think they care or pay attention, but considering both have said “here’s how to block our crawler,” I do hope that that some mistreated developer did actually program a check in to the crawler. I still think it’s worth doing, even though I don’t fully trust them.
I feel sorry for the guy now. He’s in over his head and trying to defend himself ineffectively. And now a bunch of lemmings are mocking him too, which I get, but it’s still fucked up. Humans suck.
Yeah, based on his robots.txt it seems to be a Wordpress site, so he’s probably just installed an ineffective plugin to prevent copying. At least he can take solace in the fact that most of us probably aren’t any more relevant than he is.
Meanwhile, their robots.txt doesn’t disallow GPTBot or Google Bard. So apparently they’re okay with content being stolen by for-profit companies.
you think either of those companies pays attention to robots.txt? its not legally binding or anythjng
They generally do, because very few people bother to block them anyway. Complying with robots.txt is a good way to show in court that you did put effort into complying with the usual standards, without it ever impacting the useful information you’re scraping.
At the executive level, no I don’t think they care or pay attention, but considering both have said “here’s how to block our crawler,” I do hope that that some mistreated developer did actually program a check in to the crawler. I still think it’s worth doing, even though I don’t fully trust them.
I feel sorry for the guy now. He’s in over his head and trying to defend himself ineffectively. And now a bunch of lemmings are mocking him too, which I get, but it’s still fucked up. Humans suck.
i don’t feel bad dunking on this guys site. doing this is a dick move for accessibility reasons
Yeah, based on his robots.txt it seems to be a Wordpress site, so he’s probably just installed an ineffective plugin to prevent copying. At least he can take solace in the fact that most of us probably aren’t any more relevant than he is.