Skip to content
  • Texas | Arizona | Virginia | Idaho | Illinois
  • (888) 705-0930
  • info@therawragency.com
Facebook-f Twitter Instagram Linkedin-in
rawr logo short
  • Home
  • About
  • Solutions
    colorized icons 04
    B2B Marketing
    colorized icons 05
    B2C Marketing
    colorized icons 06
    Online Visibility Management
  • Industries
    colorized icons 07
    Healthcare
    colorized icons 08
    Manufacturing
    colorized icons 10
    Home Services
    colorized icons 09
    Professional Services (B2B)
    colorized icons 11
    Retail
  • Services
    colorized icons 01
    Brand Strategy
    • Graphic Design
    • Corporate Brand Identity
    • Sales Enablement
    • Editorial Calendar Management
    colorized icons 02
    Website Design
    • WordPress Website Development
    • Conversion Rate Optimization
    • eCommerce Development
    • Content Strategy & Copywriting
    colorized icons 03
    Digital Marketing
    • Digital Marketing Strategy
    • Search Engine Optimization (SEO)
    • Pay-Per-Click (PPC) Management
    • Content Marketing & Digital PR
    • Account Based Marketing (B2B)
    • Marketing Automation
    • Social Media Management
  • Blog
  • Contact
Let's Talk

How to audit sites inside corporate networks

image1 4 1
  • September 28, 2020
  • General
  • Agency
Facebook
Twitter
LinkedIn
Email

There is a common problem when auditing staging enterprise sites inside corporate networks. 

If you work in-house, you first connect to the corporate network using a VPN client. Then, you need to run auditing tools to review the pages. 

The only tools that work are the ones that you can run directly from your computer. For example, the ScreamingFrog spider, which is a downloadable program.

However, many enterprise sites have millions of pages which makes crawling from your computer impractical due to time constraints or machine resources.

Enterprise cloud-based crawlers like DeepCrawl, Ryte, Oncrawl, etc. are better suited for this type of work. But, they are not able to audit sites inside private networks.

In addition to this, this leaves out many other valuable tools like the URL Inspection tools from Google and Bing that are critical to audit JavaScript-driven content.

If you work agency-side, you have the extra complication that security and privacy compliance is now a requirement to work with enterprises. It is common to have to complete extensive security questionnaires before you are even considered as a vendor.

The content in the staging site inside the private network might not be ready to be opened to the public. 

Introducing network admin tools for SEO

In previous articles, I’ve mentioned the importance of being aware of tools and techniques used in the development and IT industries. In this article I’m going to continue to make the case for that.

Let me introduce a couple of tools that are familiar to network and system administrators: ngrok and mitmproxy.

We can use ngrok to turn private (VPN required) URLs into temporary and public ones. We can use mitmproxy to make changes to the pages and hide and/or obfuscate the content and preserve its privacy. This requires writing simple Python scripts.

Proxies and HTTP Tunnels

Before I dive in and play with the tools, let me go over their underlying concepts. 

https://developer.mozilla.org/en-US/docs/Web/HTTP/Proxy_servers_and_tunneling

“When navigating through different networks of the Internet, proxy servers and HTTP tunnels are facilitating access to content on the World Wide Web. A proxy can be on the user’s local computer, or anywhere between the user’s computer and a destination server on the Internet. This page outlines some basics about proxies and introduces a few configuration options.“

Proxies and HTTP tunnels are standard approaches to relay requests/pages and make them available from once source site to another. Please review the linked article to learn more about the topic.

Ngrok creates HTTP tunnels and mitmproxy is a reverse proxy. 

These are two different use cases that are a good fit to solve the problems I mentioned at the start.

Using Ngrok

Ngrok creates HTTP tunnels and is super simple to setup and use. 

Let’s say your staging site is https://staging.internal-network.net:8080 and you are only able to open the page after you connect using the VPN client. 

You could expose this site temporarily so you could verify Google Search Console and Bing Webmaster Tools, and run the URL inspection tools (or enterprise crawlers) on the exposed URLs.

Here is how you do that:

  1. Download and install ngrok for your Mac or Windows PC. 
  2. Open a terminal window and launch ngrok. 

Ngrok is a command line tool, so you need to run it in a shell and pass parameters to make it work.

Now let’s create the HTTP tunnel and temporary URL.

./ngrok http staging.internal-network.net:8080 > ngrok.log 2>&1 &

Here I am asking ngrok to expose the web server that is only accessible from my computer at port 8080. I added some extra commands to log any errors to ngrok.log and finally want the process to run in the background and let me type more commands.

tail ngrok.log

I check the log has nothing and that means it should be working fine. Next, I need to get the public URL generated.

I need to make an API call to the service, which returns a JSON response that I need to parse. We are going to simplify this part by downloading another handy command line tool, jq. 

Assuming you also have curl, you can get the temporary URL with this command.

curl -s http://localhost:4040/api/tunnels | jq ".tunnels[0].public_url"

You should get a URL that you can open in your web browser like this:

“https://f8139ca0f3b9.ngrok.io“

After you open it, you will see the internal site. Try using the Rich Testing Tool on it (the URL you get, not this example) and it should work. How cool is that?

As you don’t own the ngrok.io domain, you need to take an extra step in order to register with Google Search Console and Bing Webmaster Tools. 

You need to create an account and register a custom domain that you control. 

Before you create the tunnel, you need to authenticate.

./ngrok authtoken <token>

Then, you add another parameter to specify the custom domain while you create the tunnel.

./ngrok http -hostname=dev.yourdomain.com staging.internal-network.net:8080 > ngrok.log 2>&1 &

You will be able to register this subdomain and run the URL inspection tools (or your favorite enterprise crawler).

Using Mitmproxy

So, we learned to expose staging sites inside the corporate network using temporary public URLs. But, what if we couldn’t risk making the content public and inadvertently reveal unannounced news that could hurt a publicly listed company?

One option is to layer in a reverse proxy and use it to hide or obfuscate any private information in the HTML and/or images to preserve the company’s privacy.

Mitmproxy is an awesome HTTPS proxy that, among many things, allows you modify the HTTP traffic going through it on the fly, even HTTPS, which is encrypted!

You can make simple text replacements in the command line or any arbitrary modifications by writing simple Python scripts. 

Mitmproxy can operate in several modes, we are interested in its reverse proxy one. 

It is a Python package, so you can install it using. 

pip install mitmproxy

Then call it using.

mitmproxy -P 8081 --mode reverse:https://staging.internal-network.net:8080

Let me illustrate this powerful technique with one example. 

I’m going to reverse-proxy StackOverflow and change the text in their H1 from “People” to “SEOs” 

mitmproxy -P 8081 --mode reverse:https://stackoverflow.com/ --modify-body '/ people who code/ SEOs who code'

Let’s open the browser on http://localhost:8081 and see if it works.

Kaboom! Now tell me this isn’t exciting stuff 🙂

The idea is to replace any text or images that shouldn’t be exposed publicly.

You would need to run ngrok afterwards instructing it to connect to this reverse proxy at port 8081 instead of directly to the source server.

./ngrok http -hostname=dev.yourdomain.com localhost:8081 > ngrok.log 2>&1 &

MIT stands for (Man in the middle attack), which is an information security concept that means there is an intercepting device/element in a two way conversation. This device can sniff or tamper with the information transmitted.

As you can imagine, this could be used for nefarious purposes. Fortunately, in our case, we want to use it for good. We want to hide/obfuscate sensitive information from internal pages before exposing them publicly with ngrok.

The post How to audit sites inside corporate networks appeared first on Search Engine Land.


Source: IAB

Facebook
Twitter
LinkedIn
Email

Recent Blog Posts

9 Hidden Revenue Blockers Every CRO Needs to Eliminate

February 3, 2025

What Is B2B Appointment Setting?

November 29, 2024

Leveraging LinkedIn for Targeted Growth

November 12, 2024

Mastering B2B Prospecting: Key Strategies for Sales Success

November 4, 2024

Maximize Marketing ROI: How Cost per Lead Can Fuel Your Business Growth

June 13, 2024
View More
rawr logo short
Facebook-f Twitter Instagram Linkedin-in
Get In Touch
  • Texas | Arizona | Virginia | Idaho | Illinois
  • (888) 705-0930
  • info@therawragency.com
Send An Email

"*" indicates required fields

Copyright 2025 | The RAWR Agency, LLC. |

Sitemap | Privacy Policy

We use cookies to ensure that we give you the best experience on our website. If you continue to use this site we will assume that you are happy with it.OkNo