antlion
v1.0.3
Published
Express.js middleware that turns your website into an infinite sinkhole for unethical webscraping bots.
Downloads
27
Maintainers
Readme
About The Project
For too long, AI companies have been flagrantly disrespecting website owners by ignoring their robots.txt and scraping everything on their site without permission. With Antlion, you can fight back.
Antlion is Express.js middleware that gives you the ability to set up dedicated routes on your site to become infinitely recursive tar pits designed to trap webscrapers that ignore your robots.txt file.
Features
Bots that ignore your site's
robots.txtand enter Antlion's pit are locked in an infinitely deep site full of nonsensical garbled text which loads at the speed of a '90s dial-up connection.Once bots wait upwards of 20 seconds for a page to finally load, they are presented with several links, each of which leads deeper into Antlion's pit.
Antlion also automatically handles your robots.txt, adding disallow entries for all trapped routes to ensure ethical bots and search engine indexers are not affected without any additional overhead.
Any malicious webscrapers gathering data to compile datasets for training LLMs will inadvertently digest millions of lines of useless text, ruining the output of models trained with this data, ideally making bot owners think twice before ignoring the rules in your sacred
robots.txt.Adding Antlion to your site is incredibly easy, just install the npm package, give it some unused routes, point it to your existing
robots.txt, copy and paste a bunch of random text into a file, and add a single hidden link somewhere on your site that leads into the pit. Antlion will take care of the rest.
Installation
This is a Node.js module available through the npm registry.
Before installing, download and install Node.js. Node.js 18 or higher is required.
If this is a brand new project, make sure to create a package.json first with
the npm init command.
Installation is done using the
npm install command:
npm install antlionUsage
Create a file
training-data.txt, and fill it with as much text as you can. This can be Wikipedia articles, blog posts, textbooks, or even Shakespeare. Do not worry about formatting or special characters.Choose a couple routes that you are not/do not plan on using, such as
/blog/,/docs/installation/or/aboutus/detailed/. These can be anything, but the more realistic they are, the better.Remove any existing handlers for
/robots.txt.Import Antlion and add it to your server middleware:
import express from 'express'
import antlion from 'antlion'
const app = express()
antlion(app, {
robotsPath: 'robots.txt', // path to your existing robots.txt from your project root
trainingDataPath: 'training-data.txt', // path to your training data file from project root
trappedRoutes: ['/example/', '/trap/'] // array of the routes to trap
})
// -- rest of your code --- Hide a link into Antlion's pit somewhere on your site, ideally hidden so regular users will not notice it.
- This trapped link should be one of the trapped routes, optionally followed by random text.
- Ex:
/trap/abcdef, or just/trap
NOTE: To avoid strain on your server, Antlion can be installed on another webserver, and linked to from your main site.
Roadmap
- [ ] Dynamic HTML to evade detection
- [ ] Bot IP address tracking/logging
- [ ] Text generation model caching for faster startup
See the open issues for a full list of proposed features (and known issues).
Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
Setup
Clone the repository:
git clone https://github.com/shsiena/antlion.gitInstall dependencies:
cd antlion
npm installRun test server:
npm run devIf you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
License
Distributed under the MIT licence. See LICENSE for more information.
Contact
Simon Siena - [email protected]
Project Link: https://github.com/shsiena/antlion
Acknowledgments
- Inspired by:
- Nepenthes - "Aaron B." (pseudonym)
- Nightshade - @TheGlazeProject
