Skip to main content

Creating a config.json file

In the repo where you will be running the spider from, create a new config.json file. This file will include the spider's configurations. For a full reference of the different configurations, see the API reference section. In this Getting Started tutorial, we'll be creating a simplified config for demonstration purposes.

Populate your config.json with the following content:

{
"maxConcurrency": 1,
"startUrls": [
"https://your-site-url.top-level-domain"
],
"allowedDomains": [
"your.domain"
],
"scraperSettings": {
"default": {
"hierarchySelectors": {
"l0": "title",
"l1": "main h1",
"l2": "main h2",
"l3": "main h3",
"l4": "main h4",
"content": "main p"
}
},
"shared": {
"onlyContentLevel": true
}
}
}

If your site has basic auth enabled, add the basicAuth config option to the shared settings group:

    "shared": {
"onlyContentLevel": true,
"basicAuth": {
"user": "myuser",
"password": "mypass"
}
}