Skip Navigation

[Gelöst] Performance problem with large tables (legacy post relationships)

This support ticket is created vor 3 Jahren, 5 Monaten. There's a good chance that you are reading advice that it now obsolete.

This is the technical support forum for Toolset - a suite of plugins for developing WordPress sites without writing PHP.

Everyone can read this forum, but only Toolset clients can post in it. Toolset support works 6 days per week, 19 hours per day.

Heute stehen keine Supporter zur Arbeit im Werkzeugsatz-Forum zur Verfügung. Sie können gern Tickets erstellen, die wir bearbeiten werden, sobald wir online sind. Vielen Dank für Ihr Verständnis.

Sun Mon Tue Wed Thu Fri Sat
8:00 – 12:00 8:00 – 12:00 8:00 – 12:00 8:00 – 12:00 8:00 – 12:00 - -
13:00 – 17:00 13:00 – 17:00 13:00 – 17:00 13:00 – 17:00 13:00 – 17:00 - -

Supporter timezone: America/New_York (GMT-04:00)

Dieses Thema enthält 11 Antworten, hat 2 Stimmen.

Zuletzt aktualisiert von michaelB-31 vor 3 Jahren, 3 Monaten.

Assistiert von: Christian Cox.

Author
Artikel
#2096581
select-group-by-problem-1-28-seconds.jpg

Hi support team,

First some context: I have a database with 320'000 posts (wp_posts), 250'000 term relationships (wp_term_relationships) and 2'800'000 post meta entries (wp_postmeta). All other tables are of normal size.

I have 2 custom post types with a simple one-to-many parent-child relationship. Now I have 2 views: The first view gets typically 1 post of type A, the second view then gets all children (which are of type B), normally not more than 200.

The problem: It takes at least 25 to 30 seconds to load the page with those views.

I already used the "Query Monitor" plugin to pinpoint the problem, and found 1 specific SQL query, which takes the 25-30 seconds to run. Here the example SQL query, which only has to get 110 posts (see also screenshot):

SELECT wp_posts.ID
FROM wp_posts
INNER JOIN wp_postmeta
ON ( wp_posts.ID = wp_postmeta.post_id )
WHERE 1=1
AND ( ( wp_postmeta.meta_key = '_wpcf_belongs_geschaeftsjahr_id'
AND wp_postmeta.meta_value IN ('147647') ) )
AND wp_posts.post_type = 'buchung'
AND ((wp_posts.post_status = 'publish'))
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_date DESC

I'm no database expert, but as far as I can tell, even though there is a quite big INNER JOIN with the 320'000 wp_posts table on the 2'800'000 wp_term_relationships table, it shouldn't bring the database to the knees.

Point in case: On a different view which basically outputs the same stuff and has to make the same INNER JOIN, the SQL query only needs 0.07 seconds to run:

SELECT wp_posts.*
FROM wp_posts
INNER JOIN wp_postmeta
ON ( wp_posts.ID = wp_postmeta.post_id )
INNER JOIN wp_postmeta AS mt1
ON ( wp_posts.ID = mt1.post_id )
WHERE 1=1
AND wp_posts.post_author IN (2)
AND ( wp_postmeta.meta_key = 'wpcf-datum'
AND ( ( ( mt1.meta_key = 'wpcf-datum'
AND CAST(mt1.meta_value AS SIGNED) BETWEEN '1609459200'
AND '1640995199' ) ) ) )
AND wp_posts.post_type = 'buchung'
AND ((wp_posts.post_status = 'publish'))
GROUP BY wp_posts.ID
ORDER BY wp_postmeta.meta_value+0 DESC, wp_posts.ID DESC

So I see only 2 differences and potential problems between those 2 SQL queries:
1. The legacy post relationship, which needs the "IN" SQL statement. Would the new post relationships have significantly better performance in this case?
2. Simply the way how the SQL INNER JOIN is generated by the views. However, I'm not sure if and how I could optimize the SQL for Toolset views.

Any tipp would be really appreciated. And let me know what further information you need.

Thanks in advance and best regards,
Michael

#2097169

Hello, it's hard to say for sure offhand. It is worth exploring in a test site where you can migrate to the new post relationships system, recreate the View query filters if necessary, and adjust any custom code that references the old postmeta-based relationship system. The new system utilizes proprietary tables for storing relationship information outside of postmeta, so the query structure and data set will be completely different. Do you have a staging environment available for testing the queries after relationship migration?

You also mentioned the wp_term_relationships table, but I don't see information about that table in the SQL queries here so I'm not sure the impact that has here. Again, it's worth exploring in a test environment for a real-world comparison rather than a hypothetical guess.

#2097771

Hi Christian,

OK, I already thought that I will have to test it. Just wanted to be sure, or at least exclude, that there isn't some other obvious thing I could try first.

Just for reference: In this announcement (https://toolset.com/2017/04/first-preview-of-many-to-many-relationship-in-toolset/) Amir states:
"Until now, relationships used custom fields. This is fine for smaller sites, but causes a performance problem for big sites. The new implementation uses a custom table for relationships. This means super-fast-performance, even for the largest WordPress sites."

That would exactly explain my problem. Althoug in the comments he then said:
"Sites implementing many-to-many relationship will have a very significant performance gain. Sites with one-to-many relationships are already fine. They may be slightly faster, but the current relationship model isn’t causing them any performance issue."

I hope this second comment was only regarding small sites. Because if it isn't, I'm quite lost what else the problem could be. And honestly I'm not so sure if the new relationships will fix it.

See: I just mentioned the wp_term_relationships for context. But there runs for example (on the same page for the same view) another SQL query, which also JOINS the wp_posts and wp_postmeta (and additionally the wp_term_relationships table), and also uses the legacy relationships. So you would expect an even longer execution time. But it only takes ~0,8 seconds:

SELECT wp_posts.ID
FROM wp_posts
LEFT JOIN wp_term_relationships
ON (wp_posts.ID = wp_term_relationships.object_id)
INNER JOIN wp_postmeta
ON ( wp_posts.ID = wp_postmeta.post_id )
INNER JOIN wp_postmeta AS mt1
ON ( wp_posts.ID = mt1.post_id )
WHERE 1=1
AND ( wp_term_relationships.term_taxonomy_id IN (8) )
AND wp_posts.post_author IN (2)
AND ( wp_postmeta.meta_key = 'wpcf-beleg-id'
AND ( ( mt1.meta_key = '_wpcf_belongs_geschaeftsjahr_id'
AND mt1.meta_value = '147647' ) ) )
AND wp_posts.post_type = 'buchung'
AND ((wp_posts.post_status = 'publish'))
GROUP BY wp_posts.ID
ORDER BY wp_postmeta.meta_value+0 DESC
LIMIT 0, 1

Anyway, looks like I have to migrate the relationships and find out. I'll update this thread as soon as I have tested it.

Best regards,
Michael

#2098183

I'm not a SQL expert myself, but I suppose...

AND wp_posts.post_author IN (2)

...in the second query you mentioned may be contributing to the performance difference here, since the number of posts for which a postmeta query must be considered could be significantly impacted by the limitation by post author?

You could temporarily add a similar post author limitation to the first query and run the query manually in phpMyAdmin or another direct SQL query tool to see if the performance is noticeably improved. If not, then you could try also changing the first query's IN to use BETWEEN similar to the second query, i.e. from...

AND wp_postmeta.meta_value IN ('147647') 

...to...

AND CAST(wp_postmeta.meta_value AS SIGNED) BETWEEN '147647'
AND '147647' 

...then run the query again and compare results. Again, I'm no expert but it seems that one or both of those differences in query criteria must be responsible for the massive difference in performance here.

I'll stand by for your update.

#2099075

OK, the "AND wp_posts.post_author IN (2)" actually really helped, now the query only took about 1.16 seconds. Not perfect, but a massive improvement. And maybe the new post relationships would make it even faster.

Now 2 things:
1. Even though this change helped a lot: I don't know how I could influence the SQL queries via the views. Because in both views I already have a "Post author filter" with "Select posts with the author the same as the current logged in user."

2. I still have to do the relationships update. Just for your information: I created a new topic, because I had problems to start the update process (and even other problems with the whole Layouts module): https://toolset.com/forums/topic/whole-layouts-module-not-working-after-update-really-slow-timeouts/

I think I have to fix those other problems first. So I'll come back to this later.

#2100691

1. Even though this change helped a lot: I don't know how I could influence the SQL queries via the views. Because in both views I already have a "Post author filter" with "Select posts with the author the same as the current logged in user."
Hmm, that doesn't really make sense. I must be missing something. The first query I see here has no post author limitation, maybe that is applied in a different query somehow? I'd need to examine the full query set. It could be that this query is for something other than the results, perhaps if the View has front-end search filters and is set up to use the feature to only show options that would produce results? That feature can have major performance impacts on large sites. It's hard for me to say offhand without seeing the site. If you'd like to provide login credentials, I can log in and take a look with Views debug mode and Query Monitor active.

I think I have to fix those other problems first. So I'll come back to this later.
Understood, I can stand by here, or I can take a look at #1 while you work through the other ticket.

#2102329

I'm not sure I understand which queries I should be comparing here. In the Query Monitor results for versteckter Link, select the Abfragen tab and you can see the database queries required to load each page. Sort by the time (Zeit) column and see the following queries near the top of the list. If you click "+" in the Abruffunktion column you can see the relevant call stacks for each query. These are the slowest queries (query numbers may not be identical each time the page loads, for technical reasons):

Query #97

SELECT wp_posts.ID
FROM wp_posts
INNER JOIN wp_postmeta
ON ( wp_posts.ID = wp_postmeta.post_id )
WHERE 1=1
AND ( ( wp_postmeta.meta_key = '_wpcf_belongs_geschaeftsjahr_id'
AND wp_postmeta.meta_value IN ('147647') ) )
AND wp_posts.post_type = 'buchung'
AND ((wp_posts.post_status = 'publish'))
GROUP BY wp_posts.ID
ORDER BY wp_posts.post_date DESC

Relevant call stack:

WP_Query->get_posts
WP_Query->query
WP_Query->__construct
WPV_Filter_Post_Relationship_Query->get_settings_post_in_without_m2m
WPV_Filter_Post_Relationship_Query->get_settings
WPV_Filter_Post_Relationship_Query->filter_query
WP_Hook->apply_filters
apply_filters('wpv_filter_query')
(...etc...)

Query #70

SELECT wp_posts.ID
FROM wp_posts
LEFT JOIN wp_term_relationships
ON (wp_posts.ID = wp_term_relationships.object_id)
INNER JOIN wp_postmeta
ON ( wp_posts.ID = wp_postmeta.post_id )
INNER JOIN wp_postmeta AS mt1
ON ( wp_posts.ID = mt1.post_id )
WHERE 1=1
AND ( wp_term_relationships.term_taxonomy_id IN (8) )
AND wp_posts.post_author IN (2)
AND ( wp_postmeta.meta_key = 'wpcf-beleg-id'
AND ( ( mt1.meta_key = '_wpcf_belongs_geschaeftsjahr_id'
AND mt1.meta_value = '147647' ) ) )
AND wp_posts.post_type = 'buchung'
AND ((wp_posts.post_status = 'publish'))
GROUP BY wp_posts.ID
ORDER BY wp_postmeta.meta_value+0 DESC
LIMIT 0, 1

Relevant call stack:

WP_Query->get_posts
WP_Query->query
get_posts
custom_einnahmen_beleg_nr
(...etc...)

Query #98

SELECT wp_posts.*
FROM wp_posts
INNER JOIN wp_postmeta
ON ( wp_posts.ID = wp_postmeta.post_id )
WHERE 1=1
AND wp_posts.ID IN (335387,334597,334596,334589,334588,334587,334581,334580,334579,334577,334575,334574,334571,334570,334569,334567,334565,334563,332141,320269,320268,320191,320188,320187,320186,320185,320184,320183,320182,320181,320180,320179,320177,320061,320059,320056,320053,320051,320049,319991,310143,310142,309927,306911,306840,306814,306813,306811,306773,306771,306770,306769,306763,306762,306760,306747,306746,306735,306730,306725,306723,306720,306717,306187,289735,284476,284475,284433,284432,283263,283262,283206,283205,283204,283166,283164,283162,283155,283154,283153,283152,283151,283055,278372,273916,270824,265405,265404,265403,265352,265351,265346,265345,265344,265343,265342,265341,265340,265339,265338,265328,262263,262255,262254,262253,262246,259340,247935,247839,228338)
AND wp_posts.post_author IN (2)
AND ( wp_postmeta.meta_key = 'wpcf-datum' )
AND wp_posts.post_type = 'buchung'
AND ((wp_posts.post_status = 'publish'))
GROUP BY wp_posts.ID
ORDER BY wp_postmeta.meta_value+0 DESC, wp_posts.ID DESC

Relevant call stack:

WP_Query->query
WP_Query->__construct
wpv_filter_get_posts
WP_Views->render_view
WP_Views->render_view_ex
WP_Views->short_tag_wpv_view
do_shortcode_tag
preg_replace_callback
do_shortcode
wpv_do_shortcode
WP_Views->render_view
WP_Views->render_view_ex
WP_Views->short_tag_wpv_view
(...etc...)

First, let's eliminate the second query from discussion here. This query is triggered by a custom shortcode custom_einnahmen_beleg_nr that has a get_posts query with post author filter. It is not really useful to compare this with Views, since it is a fully custom query.

The first query is the slowest query, and I think it must be used internally by Views to attempt to limit the scope of the third query. It gets all the Buchung posts related to the current Geschäftsjahre post in the Geschäftsjahre View loop. I think that group of related post IDs is then passed into the third query's post__in argument, i.e.:

AND wp_posts.ID IN (335387,334597,334596,334589,334588,334587,334581,334580,334579,334577,334575,334574,334571,334570,334569,334567,334565,334563,332141,320269,320268,320191,320188,320187,320186,320185,320184,320183,320182,320181,320180,320179,320177,320061,320059,320056,320053,320051,320049,319991,310143,310142,309927,306911,306840,306814,306813,306811,306773,306771,306770,306769,306763,306762,306760,306747,306746,306735,306730,306725,306723,306720,306717,306187,289735,284476,284475,284433,284432,283263,283262,283206,283205,283204,283166,283164,283162,283155,283154,283153,283152,283151,283055,278372,273916,270824,265405,265404,265403,265352,265351,265346,265345,265344,265343,265342,265341,265340,265339,265338,265328,262263,262255,262254,262253,262246,259340,247935,247839,228338)
AND wp_posts.post_author IN (2)
AND ( wp_postmeta.meta_key = 'wpcf-datum' )
AND wp_posts.post_type = 'buchung'
AND ((wp_posts.post_status = 'publish'))
GROUP BY wp_posts.ID
ORDER BY wp_postmeta.meta_value+0 DESC, wp_posts.ID DESC

The third query then applies the post author filter and sorts by the date field, but only in the set of post IDs produced by the first query. The third query is responsible for producing the results of the nested Buchung View. Obviously this is going to be much faster than the first query, because of the post__in limitation. Depending on how the new relationships system implements this type of query, you may or may not see significant improvements. It depends on whether the new system gets all the related posts first, then filters through that limited result set, or if the new system creates a single query that combines the post relationship and post author filters in its first attempt somehow. A quick test in a migrated site should tell you pretty quickly if that is the case or not. If not, it might make sense to run a few more tests using the wpv_filter_query API to apply this post relationship filter programmatically by adding a meta_query clause, and remove the post relationship filter from the Buchungen View's Query Filter. That would effectively combine the two queries, giving you the added performance of a post author limitation that does not require a table join. On the other hand, it might just move that expense into the second query, it's hard for me to say. Documentation for wpv_filter_query with examples:
https://toolset.com/documentation/programmer-reference/views-filters/

#2102885

Thanks a lot for the detailed answer.

Yes, the Query #70 for my custom shortcode can be ignored. I could probably optimize it. But the Query #97 is the only real problem, all other queries are like 0,1 or 0,2 seconds max.

So now if I understand it correctly:
1. Maybe/hopefully with the new relationships system Toolset creates the queries in a "better" way that is much faster. By the way: I'm aware that those automatically generated queries can't replace a human in every case. So if it does not fix this:
2. As you wrote "using the wpv_filter_query API to apply this post relationship filter programmatically by adding a meta_query clause, and remove the post relationship filter from the Buchungen View's Query Filter". I already used the wpv_filter_query for other views, so I should be able to do that.

I will come back to this as soon as we worked out the other problem/ticket and I actually can run the relationships upgrade and test those 2 steps.

#2103041

Okay great, I'll stand by for your update.

#2108291

I just reply, so the support robot doesn't close this thread. Still waiting for an answer in my other ticket 🙂

#2131801

Hello, just checking in to see if there is any update to share.

#2140221

Hi Christian,

Sorry for the late reply, here an update on this issue:

First of all, I had another problem which caused some distraction. The live site runs on a simple shared webhosting plan (not the cheapest, but still). For the test site I bought a virtual Linode.com server, precisely to exclude any obvious performance bottlenecks. BUT: I don't know why, but for some reason on the database on the Linode server the exact same discussed SQL queries (before the relationships upgrade) are at least 10x slower. And after the relationships upgrade are also like 40% slower. So yeah, that completely blindsided me 😉

Anyway, this is obviously not a problem with Toolset, I just wanted to explain it, and maybe it helps somebody else reading this.

Now, after creating a test site on the shared webhosting, and running the relationships upgrade there, it looks like a big performance improvement. From before ~9 seconds (for the 2 slowest queries) to now only one noticeable slow query which runs for ~1 second (see the before and after screenshots, one for the Linode server, one for the shared hosting).

While not perfect, I think this is perfectly reasonable for my big database. For further improvements I probably have to get my hands dirty (like with the wpv_filter_query API as you suggested). So yes, my issue seems to be resolved.

Thanks a lot for your help and best regards,
Michael