Thousand of search result pages being generated

This is the technical support forum for Toolset - a suite of plugins for developing WordPress sites without writing PHP.

Everyone can read this forum, but only Toolset clients can post in it. Toolset support works 6 days per week, 19 hours per day.

Sun	Mon	Tue	Wed	Thu	Fri	Sat
-	7:00 – 14:00	7:00 – 14:00	7:00 – 14:00	7:00 – 14:00	7:00 – 14:00	-
-	15:00 – 16:00	15:00 – 16:00	15:00 – 16:00	15:00 – 16:00	15:00 – 16:00	-

Supporter timezone: Europe/London (GMT+00:00)

This topic contains 7 replies, has 2 voices.

Last updated by Nigel 1 year, 9 months ago.

Assisted by: Nigel.

Author

Posts

May 12, 2024 at 3:26 am #2697210

kaleeR-3

Link to a page where the issue can be seen:
hidden link
hidden link

I expected to see:
Site search normally looks like this
hidden link

Instead, I got:
Not even sure how these are coming up, but they're being being crawled by Google and there are over 100,000 of them.

May 13, 2024 at 7:07 am #2697345

Nigel

Supporter

Languages: English (English ) Spanish (Español )

Timezone: Europe/London (GMT+00:00)

Hi there

The hidden link part of the links just indicate that the paginated search results pages are being indexed, though for particular search terms (like ?s=Ui) to be indexed that suggests that somewhere on the site is linking to the search result for that search term.

It could be a visible link on the site (e.g. somewhere in the content, or in comments posted by users) or an invisible link buried in the markup, or it could even be a public link somewhere else online.

The URL parameters which come from Views (e.g. wpv_view_count) point towards a page that includes a View with search filters and/or pagination.

From the links you shared the Views with IDs 5022 and 2766 appear to be involved.

As a starting point I would see where those Views are located, what they output, try searching for whether those pages are linked to from elsewhere on the site, including in comments.

May 13, 2024 at 7:59 pm #2697467

kaleeR-3

That was just a couple of examples, seems to be applying to every view we have on the site (and is also the case on a second site we use Toolset on but a different theme).

The only pages I can think of (on the site I provided the couple example links of) that use /page/2/ or /page/3/ etc are taxonomy archive pages, the Toolset archive view for some reason defaults to it, but when you use the pagination it uses both the /page/2/ as well as the wpv_view URL parameter. E.g.
hidden link
hidden link

(Edit: can you hide the images from being public please)

May 14, 2024 at 10:41 am #2697589

Nigel

Supporter

Languages: English (English ) Spanish (Español )

Timezone: Europe/London (GMT+00:00)

Any archive page that has more than 10 results (10 by default) will be paginated, which implies links like site.com/slug/page/2/

If that archive is customised by Toolset which provides the pagination links, then those links will use the same format, but also add URL parameters relating to the custom archive (e.g. the wpv_view_count parameter).

So links such as hidden link are expected, and are likely to be indexed by search engines (assuming they find the first page via a link or sitemap, and that page contains a link to the second page, which contains a link to the third page, etc.).

The search result page (where you have the ?s=... parameter) is also an archive, which operates the same way if the archive is customised by Toolset.

The puzzling part is just why certain searches are included in the indexed pages (the paginated variants of them are expected, once the first page has been indexed).

The first link you shared, which is an empty search, the search archive with no search term, I can understand, because your pages include a search form, and while I don't know enough about search crawlers to say for sure, it is feasible it constructs a link to the search archive from it. And once it arrives at the first page, all of the paginated pages are just a question of following the pagination links.

The links I don't understand are those like hidden link.

That links to the search archive for the search term "Ui". A link to that must exist somewhere, on your site, in comments, the sitemap, or possibly an external link to your site, and once the crawler arrives at the page it will follow all of the pagination links.

The only Toolset part of this as far as I can see is that the links include URL parameters such as wpv_view_count, and those reflect the fact that the archives are customised using Toolset.

May 17, 2024 at 3:30 pm #2698366

kaleeR-3

> The only Toolset part of this as far as I can see is that the links include URL parameters such as wpv_view_count, and those reflect the fact that the archives are customised using Toolset.

I'm not sure I agree here. For example on the other site I mentioned we have pagination using the default wordpress logic, however we're seeing thousands of toolset URLs.

These two are duplicate pages, and you can see from the main /reviews/ page that the pagination logic is using /page/2/, /page/3/, etc. Where is the wpv URL coming from and why is it even loading?
hidden link
hidden link

For the ones like this it doesn't make any sense that they would even load, they should be going to a 404 error not loading a view with the weird pagination at the bottom (you can see the styling is broken also on the pagination).
hidden link

Let's look at the toolset site for example.
Search page URL format: https://toolset.com/?s=Ui
There is a view on this page: https://toolset.com/showcase/
Using the filters I can see the wpv_view_count is 1493549
It also uses the regular WordPress pagination /page/2/, /page/3/, etc.
All that being the case why isn't this URL loading a view instead of going to a 404 like it is on my site. Where is the difference coming from?
https://toolset.com/page/45/?s=Ui&wpv_view_count=1493549&wpv_paged=27

EDIT TO ADD:
One new thing I noticed, along with the URL there are multiple views being added on this page.
hidden link
it looks like 5 views total, in the page itself (in admin) the only thing added is a view block with view ID 56632.

May 20, 2024 at 8:10 am #2698601

Nigel

Supporter

Languages: English (English ) Spanish (Español )

Timezone: Europe/London (GMT+00:00)

OK, looking at this page: hidden link

I see, checking the source code, that it is not using a custom Toolset archive, and as far as I can see it is not using any Views on the page at all (e.g. in widgets).

So there would be no reason to expect a URL like hidden link which includes parameters from Toolset Views (or custom archives).

"For the ones like this it doesn't make any sense that they would even load, they should be going to a 404 error."

That's not the case, anymore than hidden link should. Unknown parameters are just ignored.

For a URL like hidden link to be crawled by Google, its robot has to have found that same link and followed it.

On pages which *are* custom Toolset archives, then the links come from the archive itself, for pagination.

But for pages which are not, the link could be from some other page on the same site (including an archive), or it could be from the sitemap, or it could even be from an external site. (Failing that, could it be some strange caching issue? Not sure.)

If you haven't already, I'd check that your sitemap looks sensible, if you have one.

Then, for this particular link, if it came from another page on the same site, it makes sense to think it came from whatever page contains the View with ID 884 (or it could be a custom archive with ID 884).

Can you locate that? Does it contain any links to hidden link? (I wondering if the links are malformed in some way, such that the URL parameters for the current page which includes URL parameters for View ID 884 are somehow getting added to the link to the hidden link page.)

It's quite a while since I've used Google Analytics or Search Console, but I think it may be possible for the URLs it reports crawled by its robot to determine the source of the link, i.e. where was the robot when it followed the link in question. If that were possible it should help narrow things down.

May 20, 2024 at 8:21 am #2698602

Nigel

Supporter

Languages: English (English ) Spanish (Español )

Timezone: Europe/London (GMT+00:00)

Separately, your last point about multiple Views on the page.

Inspecting the page markup, it shows that the page inserts other pages, and each of the pages includes a View.

So we have

page ID 56628
  which includes View 56632
page ID 12122
  which includes View 12127
page ID 5798
  which includes View 5800
page ID 2801
  which includes View 2805
page ID 2771
  which includes View 2776

I can't see the content of the page itself to know why it is inserting other pages into itself.

May 22, 2024 at 4:24 pm #2698995

kaleeR-3

> It's quite a while since I've used Google Analytics or Search Console, but I think it may be possible for the URLs it reports crawled by its robot to determine the source of the link, i.e. where was the robot when it followed the link in question. If that were possible it should help narrow things down.

That's part of the problem with there being so many, it's pretty much impossible to follow a chain of links that is thousands of links long. In search console I can inspect a URL and see the referring URL, but that could be 10k or more links back. All it takes is a couple of links and then it spirals from there. It's terrible for crawl budgets on large sites. :/

> Separately, your last point about multiple Views on the page.
> Inspecting the page markup, it shows that the page inserts other pages, and each of the pages includes a View.
> I can't see the content of the page itself to know why it is inserting other pages into itself.
The page doesn't actually exist, you can see the canonical that the page is actually a taxonomy page here which loads fine with no issues
hidden link
but for some reason it's also loading at this URL with multiple views?
hidden link

The topic ‘[Closed] Thousand of search result pages being generated’ is closed to new replies.