Talk on SEO for Aggregation Websites like Comparison Search Engines, Marketplaces or Classifieds platforms. Including Panda Diet and Internal linking, etc
5. Most Behemoths Are Aggregation Websites with 1M+ Pages
Vertical Search Engines
• i.e. Comparison Shopping
Engines (CSEs) and Meta-
Search Engines
• Scraping and aggregating
price/fare and product
information
• Partly relying on affiliate data
and feeds
Classifieds
• Real Estate, Cars, Jobs,
Holiday Rentals, General
Classifieds
• Aggregating user-generated
or previously published
offers/ads
• Content usually expires after
certain timeframe
Marketplaces
• Aggregating supply
(product/service feeds) and
demand at the same time
• Supplies often have several
points of sale and syndicate
data
Social Networks & Forums
• Vast amounts of user
generated content
• Insufficient control over quality
and information architechture
Most of these are „Intermediaries“ doing „Search“ and implicitly violate Guidelines.
6. Advantages & Challenges of Aggregators
ChallengesAdvantages
• Aggregation attracts demand (users)
through superior availability,
assortment (choice) and competition
(price)
• High degree of automation
• Both market sides may create lots of
content, data and value
• Extremely scalable and capital
efficient
• Consequently build network effects
and moats over time…
• …and become hyper-profitable and
well defendable
• Automation potentially creates billions
of documents
• Quality of content/inventory is
extremely diverse
• Panda/Core algorithm sparked a
structural decline of the whole sector
• Google positions own verticals above
SERPs
• Aggregators may potentially violate
different Google Guidelines:
• Dupe Content (int/ext)
• Thin Content
• Affiliate Content
• Indexable Search
10. But It‘s Has Gotten A Lot Better Recently…
“…there’s some really good stuff here. But there’s
also some really shady or iffy stuff here as well…
and we don’t know like how we should treat things
over all. That might be the case.” @JohnMu
23. Focus Areas of Concern for Huge Websites
SEO
Content Popularity Technical SEO
• Inventory
• Text
• Rich Media
• Video
• Advice
• Structured Data
• Tools & Apps
• Interactive Content
• …
• Links
• Mentions
• Brand Search
• Comp. Brand
Search
• Direct Type-Ins
• Sharing
• All available signals
• Internal Linking
• URL Design
• Indexing
• Heading Tags
• Href Lang Setup
• Structured Data
• HTTPS/HTTP2
User Experience
• Bounce Rate
• Back To SERP
• Dwell Time
• Retention
• Trust
• Search Journey
• Satisfaction of Intent
PageSpeed
*
* 2011 Major Google Update named after Engineer Panda Navneet
24. Today we‘ll learn:
1. Index Management
2. Crawl Budget
Optimisation with
internal Linking
3. Making Users Happy!
4. Practise with Case
Studies
25. Theory: Typical Page Quality (Qp) over Number of Pages (np)
np
Qp
Homepage
Category
Category+Brand
Facetted Search
Thin Catalogue (low inventory)
Dupe Content page „no results“ page
highestlowestmediorceuseful
400.000200.000 300.000100.000
Page Quality (Qp) can
be defined as content
richness, engagement,
ultimateley how useful
the page is to the user.
But also its revenue
potential.
PROBLEM:
Since Panda (2011) this
structure has become toxic.
27. Theory: Typical Page Quality (Qp) over Number of Pages (np)
np
Qp
highestlowestmediorceuseful
400.000200.000 300.000100.000
Average Quality
😞
Quality Threshold (mediocre and better)
NOINDEX
(320.000)
INDEX
(80.000)
New Average Quality
QTY
INCREASE
Panda Diet:
Let‘s cut some crap!
Quality Threshold
RANKINGS
Page Quality (Qp) can
be defined as content
richness, engagement,
ultimateley how useful
the page is to the user.
But also its revenue
potential.
28. Identifying Low Quality Pages by Page-Type
Easy NOINDEX Targets
• „no results“ pages
• Few results pages (set item threshold)
• Single review pages, other low-quality UGC
• Bulk product pages
• Any dupe pages
• Facetted search w/o search demand
• Out of stock pages
• Expired offers/ads
• Parameters, etc…
If your site has more indexed pages than things on sale – you‘re
doing it wrong!
30. Identifying Low Quality Pages: Data Driven Approach
Data to support page quality decisions
• Revenue distribution on landing pages (Google Analytics)
• Engagement and commercial metrics per page-type
• Conversion rate related to inventory count
• Demand-Data (Search Volume, PPC traffic, navigational traffic)
• „Indexation Gap“ (Sitemaps, Submitted vs. Indexed)
• Crawling Activity (Server Logs)
• Hint: Consider using De-Indexing sitemaps to accelerate Panda Diet
31. Theory: Typical Page Quality (Qp) over Number of Pages (np)
np
Qp
highestlowestmediorceuseful
400.000200.000 300.000100.000
Truth is:
This curve doesn‘t look
like this…
Page Quality (Qp) can
be defined as content
richness, engagement,
ultimateley how useful
the page is to the user.
But also its revenue
potential.
32. Theory: Typical Page Quality (Qp) over Number of Pages (np)
np
Qp
highestlowestmediorceuseful
400.000200.000 300.000100.000
Truth is:
This curve doesn‘t look
like this…
BUT: More like THIS!
Page Quality (Qp) can
be defined as content
richness, engagement,
ultimateley how useful
the page is to the user.
But also its revenue
potential.
33. Theory: ACTUAL Page Quality (Qp) over Number of Pages (np)
np
Qp
highestlowestmediorceuseful
400.000200.000 300.000100.000
Truth is:
This curve doesn‘t look
like this…
BUT: More like THIS!
ACTUALLY… like THIS!
Page Quality (Qp) can
be defined as content
richness, engagement,
ultimateley how useful
the page is to the user.
But also its revenue
potential.
34. Theory: ACTUAL Page Quality (Qp) over Number of Pages (np)
np
Qp
highestlowestmediorceuseful
400.000200.000 300.000100.000
Page Quality (Qp) can
be defined as content
richness, engagement,
ultimateley how useful
the page is to the user.
These pages typically…
• Never saw a visit, nor any conversions (GA Organic Langing
Pages)
• Aren‘t crawled any longer, as Google wont rank them anyway
(logs)
• Are not being considered for indexation (GSC Sitemaps Monitor)
While 100% of your revenue is here!
42. If Noindex: Consequently „Orphanize“ Pages
Home
One Two Three
NOINDEX
Viable solutions for link removal
• Nofollow
• Dynamic Serving („Cloaking“)
• Client-side JS
• PRG Pattern
• Forms/Buttons
43. Get Rid Of Pagination (Entirely)
Pagination Best Practise
• Pagination is a stupid offline concept
• More items, less pages, less problems
• Users like comprehensive pages (A/B Test)
• NOINDEX pagination if possible
• Remove links to those pages
• No pagination pages – no problem
• Make sure discovery remains intact
No one, ever…
44. This useless shit… Gone (for Bots at least)
Social Profile
Links
Locale Selector
Keep these on you Homepage or About Us, but not on every page.
(If they are important for the user, why are they in the footer?)
47. Case Study: How to identify the least valuable pages?
1. Out of Stock Handling: (OoS pages generate lots of html pages and poor UX)
1. If OoS for good: 301 to most similar page (parent category) or 410 if no alternative
2. (If potentially restocked keep page alive (200), offer restock alert and/or alternatives)
2. Facetted Search (Filters) & Indexable Site Search
1. Set minimum item threshold to define a „good“ search result page that doesn‘t look like a SERP
2. Build clusters where possible (typos, plurals, refined queries, entities)
3. Apply quality thresholds (Dwell time, Bounce rate, conversion) to SERP in SERP pages (indexing int. Search)
3. Pagination
1. Show more items per page (3x more items = 1/3 of pages)
2. Best solution for pagination: no pagination
4. PDP (product detail page) reduction
1. Get better at understanding shelf huggers and bestsellers using your data
2. Advanced: Predict page performance with machine learning (OEM, price, category, attributes, etc)
3. Merge variants into master products (sizes, patterns, color, etc)
5. Reviews & FAQ: Use Overview pages for reviews & questions, don‘t index single pieces of content
6. Don‘t built a self-fulfilling prophecy
1. Allow triggers for re-indexation (ppc traffic, navigational demand, etc)
60. Case Study: How to identify the least valuable pages?
1. Facebook Index Coverage: Accessibility vs. Page Quality
2. Inactive/Empty Groups, Pages, Users, Places
3. Privacy-aware users (or create incentive to share public to improve LP value)
4. Use Engagement as a quality metric for post-URLs (doesn‘t get much better than this)
5. Marketplace (See Advanced Panda Diet)
6. Expired Events
7. …
61.
62. Case Study: How to identify the least valuable pages?
1. Facebook Index Coverage: Accessibility vs. Page Quality
2. Inactive/Empty Groups, Pages, Users, Places
3. Privacy-aware users (or create incentive to share public to improve LP value)
4. Use Engagement as a quality metric for post-URLs (doesn‘t get much better than this)
5. Marketplace (See Advanced Panda Diet)
6. Expired Events
7. …
65. Balance: Algorithmic Internal Linking for 1.000 Pages
1. New York
2. London
3. Paris
4. Rome
5. Amsterdam
6. Milan
7. Barcelona
8. Prague
9. Dublin
10. Berlin
1. Munich
2. Warzaw
3. Madrid
4. Copenhagen
5. Stockholm
6. San Francisco
7. Toronto
8. Hamburg
9. Rio de Janeiro
10. Cairo
1. Seattle
2. Marrakesh
3. Sofia
4. Wroclaw
5. Helsinki
6. Vancouver
7. Hanover
8. Marseille
9. Alicante
10. Edinburgh
First Tier
Top 10
This class of pages gets 1.000 Links each
Second Tier: Random
10 out of Top 100
This class of pages gets 100 Links each
Third Tier: Random
10 out of Top 1.000
This class of pages gets 10 Links each
• Shuffle container 2+3, but keep static per page
• Include relevance score/silos/topical proximity to improve UX
66. 66
Fix Internal Linking Using Bestseller Lists
1. Standard Sorting: Popularity
2. Dyn. Bestseller Lists for Prioritization
3. „New Arrivals“ for Discovery
4. Related Products für Completeness
5. Breadcrumb for Bottom Up Prio
6. Prio über Sitemap: Ask Santa about it!
69. Frequently Asked Questions
How isn‘t this cloaking?
I‘m afraid I could lose all my long-tail revenue. *mimimi*
Should I remove all those pages in one drastic move? Wouldn‘t Google see that as a weakness?
Should I really dynamically switch/flap index directives?
How does GoogleBot discover my content without pagination?
1. It doesn‘t alter user experience 2. It only makes Google‘s job easier 3. Take a look at Amazon, bro
1. There‘s usually no data confirming the long-tail 2. Rankings are usually not lost but substituted by
superior pages 3. Google actually prefers pages with good UX over the most specific result (Hummingbird,
RankBrain instead of perfect title string match)
It‘s always a good time to do the right thing!
I think you should. See
above.
If you need pagination for discovery, you‘ve got bigger fish to fry. Seriously…
70. What to remember…
1. We‘re doing this for 10 years (Pre-Panda) now and it has never backfired
2. This is most important if your website has more than 100.000 pages
3. Index Bloat: Millions of indexed HTML documents are not an asset but a
liability. Indexing everything is inefficient by definition.
4. 80 % (actually 95%) of your website usually is dead weight. And it‘s
pulling down your best pages.
5. Analyse your potential with an organic landing page report
6. There‘s no black and white, but a reasonable amount of grey which
should be defined by data
7. Non-transactional content is (most likely) overrated. (Inventory=Content)