Qualys Web Application Scanning (WAS) stands out as the industry's leading Dynamic Application Security Testing (DAST) solution. Delving deeper into these settings is crucial for effectively harnessing its potential to uncover vulnerabilities. Scan coverage is greatly influenced by the crawl settings, making it imperative to grasp the intricacies of these configurations.
For web application scanning to be successful and efficient, accurate crawling is a crucial aspect for which having the ability to customize is critical. Enter the world of crawl settings and multiple features that empower you to customize the path your scanning process takes through your website. In this blog, we'll explore the various crawl settings available when adding/updating your web application configuration in WAS and understand what each option entails and how it impacts your website's scanning process.
Imagine your website is a vast city, and the URL hostname is its main address. Choosing the "Limited at or below URL hostname" option means your scanning process will stick to exploring within the walls of this primary domain. For instance, if your starting URL is http://www.test.com, the crawler will diligently crawl all links within the www.test.com domain. This includes all subdirectories, such as http://www.test.com/support and https://www.test.com:8080/logout. All links found at HTTP, HTTPS, or any unconventional port will be crawled.
Links That Will Be Crawled:
Links That Will Not Be Crawled:
Now, imagine your website is like a library with different sections. Choosing the "Limit to content located at or below URL subdirectory” option means your scanning process will focus on a specific section of the library. If your starting URL is http://www.test.com/news/, the crawler will navigate through all links starting with http://www.test.com/news. Even if the paths lead to subdirectories, like http://www.test.com/news/headlines or https://www.test.com:8080/news/, the crawler will discover them. Please note that a trailing forward slash “/” is needed at the end of your target subdirectory.
Links That Will Be Crawled:
Links That Will Not Be Crawled:
Continuing our story, picture your website as a network of interconnected hubs. Choosing the "Limit to URL hostname and specified sub-domain" option allows your scanning process to traverse the primary domain and a designated sub-domain. Suppose your starting URL is http://www.test.com/news/, and the sub-domain is sub1.test.com. In that case, the scanner will crawl into both www.test.com and sub1.test.com, along with their respective sub-domains. This encompasses links like http://www.test.com/support, https://www.test.com:8080/logout, http://sub1.test.com/images/, and even http://videos.sub1.test.com.
Links That Will Be Crawled:
Links That Will Not Be Crawled:
Finally, think of your website as a garden with carefully tended patches. Choosing the "Limit to URL hostname and specified domains" option allows your scanning process to explore only the main domain and specific patches (domains), even if those other “patches” are in different domains. If your starting URL is http://www.test.com/news/, and the specified domains are www.anothersite.com and moresites.com, the crawler will roam through www.test.com, www.anothersite.com, and moresites.com, along with their respective links.
Links That Will Be Crawled:
Links That Will Not Be Crawled:
When it comes to web application scanning, there are scenarios where you might want to specifically target certain URLs that might not be directly linked within the application. Enter the world of Explicit URLs, a powerful tool that enables you to guide the scanning process exactly where you want it to go. This feature is especially handy for cases like email-based registration, where a user clicks a link in an email to access a particular page within your application.
To get started with Explicit URLs, here are a few key things to keep in mind:
Best Practice Tip! Not sure what crawl scope to use? Start with the default “Limit at or below URL hostname.” Any subdomains or external domains that should be considered for your scope refinement will be reported in your scan report and the WAS Catalog.
Another powerful feature that contributes to the accuracy and comprehensiveness of your web application scanning is the ability to configure how your scan interacts with various links. The following are additional optional configurations that can ensure full scan coverage:
Best Practice Tip! Qualys Browser Recorder can be used on both Chrome and Microsoft Edge browsers for widespread compatibility.
At this point, you can click “Next” to be taken to Step 3 in manually configuring your web application and APIs –Default Scan Settings. Please join us for our next blog in this series, where we will cover Default Scan Settings in detail and provide even more best practices for getting the most out of Qualys WAS.