Christians Tagebuch: typo3

The latest posts in full-text for feed readers.


Running TYPO3 in a subdirectory

At work we're running a couple of composer-based TYPO3 v11 instances in a subdirectory instead of the root path of domains, e.g. https://example.org/info/ instead of https://example.org/. We do this because the main web application runs on /, while /info/ serves editor-editable content.

The setup consists of 2 servers/containers. One serves the main web application on https://example.org/ and proxies all /info/ requests to the second web server. The second web server delivers a TYPO3 instance.

It's more complicated than the average reverse proxy setup. In production, the initial proxying is done by AWS Cloudfront - and that does not support stripping paths when proxying.

Main webserver

Relevant Apache 2.4 site configuration:


    ServerName example.org
    [...]
    ProxyPass        "/info/" "http://typo3.example.org/info/"
    ProxyPassReverse "/info/" "http://typo3.example.org/info/"

]]>

TYPO3 webserver

Apache site configuration:


    ServerName example.org
    ServerAlias typo3.example.org
    UseCanonicalName On

    [standard typo3 configuration follows]

    
        # Reverse Proxy configuration in Terraform does not allow to
        # proxy "/info/" to "/" - it maps "/info/" to "/info/"
        # So we need to remove that.
        # Also requires REQUEST_URI modification in AdditionalConfiguration.php
        RewriteRule "^info/(.*)" "$1"
    

]]>

TYPO3's site configuration is configured to have /info/ in the base path:

config/sites/example/config.yaml
base: 'https://example.org/info/'

TYPO3 also needs to know that it's running in a reverse proxy setup:

public/typo3conf/LocalConfiguration.php
 [
        //[...]
        'reverseProxyHeaderMultiValue' => 'first',
        'reverseProxyIP' => '172.17.190.*,127.0.0.1',
        'reverseProxyPrefix' => '/info',
        'reverseProxySSL' => '172.17.190.',
    ]
];
]]>

And at last, we rewrite the request URI variable used by TYPO3:

public/typo3conf/AdditionalConfiguration.php

Local setup

The setup above requires two servers/containers, which is fine for production. Local development setup is a bit simpler; we use one container with two different domains:

example-proxy.test

Used for the main webserver.

We access TYPO3 frontend at http://example-proxy.test/info/, and the backend at http://example-proxy.test/info/typo3/.

It proxies to http://example.test/info/.

example.test

Runs TYPO3

Published on 2025-04-10 in ,


TYPO3 cache clearing blocked by CloudFront

One of our customers at work uses AWS CloudFront in front of a TYPO3 v11 installation. Clearing the frontend and backend cache in the TYPO3 backend fails:

403 ERROR

The request could not be satisfied.

Request blocked. We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.

If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.

Generated by cloudfront (CloudFront)

Request ID: 2342xxx

The problem occured only in Firefox, but not Chromium.

Analyzing the problem

Clearing the caches from within the backend are HTTP POST requests to

https://example.org/typo3/record/commit?token=xxx&cacheCmd=pages

and

https://example.org/typo3/record/commit?token=xxx&cacheCmd=all

I copied the request as curl command and through trial-and-error narrowed it down to the following minimal example:

$ curl -i 'https://example.org/typo3/record/commit?token=xxx&cacheCmd=pages'\
 -X POST\
 -H 'Content-Type: multipart/form-data; boundary=2'\
 --data-binary $'2--'
HTTP/2 403

.. but it worked as soon as there was a letter in the form boundary:

$ curl -i 'https://example.org/typo3/record/commit?token=xxx&cacheCmd=pages'\
 -X POST\
 -H 'Content-Type: multipart/form-data; boundary=2a'\
 --data-binary $'2a--'
HTTP/2 302 

CloudFront

The customer's administrators told me that a web application firewall (WAF) was activated, and that the rule AWS#AWSManagedRulesSQLiRuleSet#SQLi_BODY is the one blocking the request.

AWS support told me that since the requests with a numbers-only form boundary appear as --2342 on the wire, it looks like an SQL injection where the rest of the SQL was commented out with two dashes. This is something they want to block, and thus the WAF rule would stay as it is.

They will not fix their rule and advise us to build an own rule with higher priority that will give such requests a green light.

Chromium vs. Firefox

Cache clear requests with Chromium always worked because it uses multi-part form boundaries that have the "WebKitFormBoundary" prefix, e.g. ------WebKitFormBoundarynSAzt2srqKsb9dvj--. Firefox has no such prefix and will sometimes generate boundaries with numbers only - especially when there are no POST data, like here with the cache clear requests.

Published on 2024-08-21 in


TYPO3: Change page title from plugin

TYPO3 v9 introduced the Page Title API that should be used now.
This blog post is obsolete.


At work I had a web site that showed records on a listing page, and offered more information for each records on a detail page. The task was now to change the page title on the detail page to the record's own title.

The naive solution is to simply set the page title in the central frontend output object:

$GLOBALS['TSFE']->page['title'] = 'foo';

But this does not work on uncached plugins.

TYPO3 page rendering process

To understand why, we need to look how TYPO3's cache works together with uncached plugins:

  1. Page <head> and <body> is generated and combined to a single string of HTML.

    Uncacheable plugins are not executed yet; a placeholder is added instead:

    some html..<!--INT_SCRIPT.abcdef-->more html

    Additional placeholders are added for additionalHeaderData and additionalFooterData.

  2. This generated HTML is stored in the page cache.
  3. In TypoScriptFrontendController::INTincScript(), TYPO3 iterates over all plugin placeholders, executes the respective plugin code and replaces the placeholder with the plugin output

    It also replaces additional*Data placeholders with their values from $GLOBALS['TSFE'].

  4. This final HTML is send to the user.

When the user requests a cached page, only the last two steps 3 and 4 are executed. Thus there is no way to change the page title generated with TypoScript.

Solutions

There are three possible solutions to set the page title from an uncached plugin:

  1. Disable normal page title and insert it with additionalHeaderData
  2. Replace already generated <title> tag during plugin processing
  3. Replace already generated <title> tag in contentPostProc-output hook

I suggest option 1.

Disable title, add it with additionalHeaderData

This is the option I recommend: It works with both cached and uncached plugins, and it keeps your code in one place.

At first, disable the creation of the normal title tag via config.noPageTitle for the pages that contain the plugin:

[globalVar = TSFE:id = 23|42]
config.noPageTitle = 2
[global]

In your plugin's logic, add the page title to TSFE's additionalHeaderData:

additionalHeaderData['myCustomUserIntTitle']
    = '' . $this->getTitle($newTitle) . '';]]>

That's all needed.

Other people recommending this solution:

Content replacement during plugin processing

When a cached plugin is processed, the cached HTML code is available in $GLOBALS['TSFE']->content. You might be tempted to simply modify it during plugin processing..

This works for uncached plugins only. In cached plugins, $content is not filled and changing it does not do anything since it gets overwritten later.

content = preg_replace(
     '#.*<\/title>#',
     '<title>' . htmlspecialchars($newTitle) . '',
     $GLOBALS['TSFE']->content
);]]>

Some people recommend this:

Title replacement in post processing hook

TYPO3 allows you to register a hook that gets executed just before the content is sent to the user. Just as in option #2 you can search and replace on the HTML:

ext_localconf.php
$GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['tslib/class.tslib_fe.php']['contentPostProc-output']['robots'] = \Vnd\Class::class . '::contentPostProc';

And now you can preg_replace your new title into the HTML:

content = preg_replace(
        '#.*<\/title>#',
        '<title>' . htmlspecialchars($newTitle) . '',
        $pObj->content
    );
}]]>

This does work for cached and uncached plugins.

The downside is that your title creation and title insertion code are in separate places now (plugin rendering vs. postproc-hook).

Additional notes

vhs has a <v:page.header.title> view helper that only works for cached plugins.

Published on 2016-07-13 in ,


TYPO3: Get a page title in fluidcontent's backend preview

At work we're mainly using fluidcontent to create custom TYPO3 content elements. It allows us to define configuration, preview and rendering in a single XML file.

One of our content elements allowed the selection of a page record, which would be linked to in the frontend rendering. In the backend's page preview, I wanted to show the ID and the title of that linked page.

It turns out that there are no Fluid view helpers that provide the title for a page UID, nor does VHS provide such a thing (v:resource.record is only usable for FAL relations, and v:page.info does not work in the backend)

Instead of writing my own view helper class, I misused v:page.rootline to obtain the page title:

{v:page.rootline(pageUid: pageUid) -> v:iterator.first() -> v:variable.set(name: 'page')}
Page title: {page.title}

Content element backend preview with page title

Published on 2018-03-07 in


TYPO3: Link to an Atom feed on every page

If your TYPO3 instance offers an Atom feed (or RSS), you should link to it in the HTML page <head> to enable feed autodiscovery by feed readers and podcast clients.

In your extension, add a feed page ID setting to the TypoScript constants:

Configuration/TypoScript/constants.txt
#cat=project/links; type=int+; label=Podcast index page
pids.podcast = 61

In the TypoScript that generates the page, add the header as follows:

Configuration/TypoScript/setup.txt
page = PAGE
page {
    headerData {
        30 = TEXT
        30 {
            typolink {
                parameter = t3://page?uid={$pids.podcast}&type=6
                returnLast = url
                forceAbsoluteUrl = 1
            }
            wrap = <link rel="alternate" type="application/atom+xml" title="Podcast" href="|"/>
        }
    }
}

Note that we use page type 6 for atom feed output here, which is also configured via TypoScript.

The example here works fine with TYPO3 v8.

Published on 2017-10-24 in


TYPO3: List of backend UI elements

At work I lately was building a TYPO3 backend module to control some static HTML generating export script. I wanted the module to look native and sifted through the backend to find the UI elements I needed - which was cumbersome.

Thanks to the helpful people in the TYPO3 chat I was directed to the Styleguide extension.

It provides a list of all backend UI elements available in TYPO3 and was very helpful. It is important to install the git version, because the TER version was outdated.

Published on 2017-10-19 in ,


TYPO3: Limit record types on a page

System folders in TYPO3 are often used to collect data of a certain type. It often makes sense to limit the type of records that can be inserted on the page: It makes data authoring easier for editors, since there are not 100 record types to choose from. And it helps keeping the folder clean of accidential rogue database records.

To enable only certain records on a page, edit the page settings and go to tab "Resources". Add the following "Page TS Config" and add a allowedNewTables setting:

Now, only OSM markers, tracks and vectors may be created on that page:

Before

Add new record: all of them

After

Add new record: Only some

PHP configuration

Apart from PageTSconfig, you may use the global PAGES_TYPES array to limit the allowed record types on a certain page type:

$GLOBALS['PAGES_TYPES'][$categoryDoktype]['allowedTables']
    = 'pages,tt_content,sys_file_reference,sys_template';

This works in at least TYPO3 v7.

Published on 2013-09-03 in ,


TYPO3: Allow records on the root page

Writing a TYPO3 extension, I needed to allow records of a certain table to be created on TYPO3's root page (pid=0). While the solution is easy, it took me a while to find it.

In ext_tables.php, add a rootLevel setting to your table's ctrl array:

$TCA['static_countries'] = array(
    'ctrl' => array(
        'title'     => 'Countries',
        'label'     => 'cn_short_en',
        'rootLevel' => 1,
    ...

TYPO3 allows three settings here: 0, 1 and -1, which are described in the Core API documentation:

rootLevel = 0
Records may only exist in the page tree, not in the root.
rootLevel = 1
Records may only exist in the root, not on the page tree. Note that only admins may edit records in the root page.
rootLevel = -1
Records may exist in both the root and the page tree.

Also see TYPO3: Limit record types on a page.

Published on 2011-12-21 in


Searching REST API documents with TYPO3's indexed_search

Instead of writing our own search, we managed to integrate REST API data into a TYPO3's native indexed_search results. This brings us a mix of website content and REST data in one result list.


A TYPO3 v7.6 site at work consists of a normal page tree with content that is searchable with indexed_search.

A separate management interface is used by editors to administrate some domain-specific data outside of TYPO3. Those data are available via a REST API, which is utilized by one of our TYPO3 extensions to display data on the website.

Those externally managed data should now be searchable on the TYPO3 website.

Integration options

I pondered a long time how to tackle this task. There were two approaches:

  1. Integrate API data into indexed_search, so that they appear inside the normal search result list.
  2. Have separate searches for website content and API content. The search result list would have two tabs, one for each type. An indicator would show how many results are found for each type and the user would need to switch between them.

The second option looked easier at first because it does not require one to dig into indexed_search. But after thinking long enough I found that I would be replicating all the basic features needed for search: Listing data, paging, and those tabs as well.

The customer would then also demand that we'd have an overview page showing the first 3 results from each of the types, with a "view all" button.

In the end I decided to use option #1 because it would feel most integrated and would mean less code.

How indexed_search + crawler work together

At first I have to recommend Indexed Search & Crawler - The Missing Manual because it explains many things and helps with basic setup.

URL list generation

You may create crawler configurations and indexed_search configurations in the TYPO3 page tree. Both are similar, yet different. How do they work together?

  1. The crawler scheduler task and command line script both start crawler_lib::CLI_run().
  2. cli_hooks are executed. indexed_search has registered its IS\CrawlerHook as one, and that is started.
  3. All indexing configuration records are checked for their next execution time. If one of them needs to be run, it is put into crawler queue as a callback that runs IS\CrawlerHook again.
  4. The crawler queue is processed and calls IS\CrawlerHook::crawler_execute().
  5. IS\CrawlerHook::crawler_execute_type4() gets an URL list via crawler_lib::getUrlsForPageRow().

    1. Crawler configuration records are searched in the rootline.
    2. URLs are generated from the configurations found (crawler_lib::compileUrls())
    3. URLs are queued with crawler_lib::urlListFromUrlArray()

Note that the crawler only processes entries that were in the queue when it started. Queue items added during the crawl run are not processed yet, but in a later run.

This means that it may take 6 or 7 crawler runs until it gets to your page with the indexing and crawler configuration. It's better to use the backend module Info -> Site crawler to enqueue your custom URLs during development, or have a minimal page tree with one page :)

Crawler URLs

Crawler configuration records are URL generators.

Without special configuration, they return the URL for a page ID. Pretty dull.

The crawler manual shows that they can be used for more, and gives a language configuration as example: &L=[1-3|5|7]. For each page ID this will generate 5 URLs, one for each of the listed languages 1, 2, 3, 5 and 7.

Apart from those value ranges, you may specify a _TABLE configuration :

&myparam=[_TABLE:tt_myext_items;_PID:15, _WHERE: and hidden = 0]

This is where we need to step in: We may handle those [FOO] values and expand them ourselves with a hook:

ext_localconf.php
$GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['crawler/class.tx_crawler_lib.php']
    ['expandParameters'][] = \Vnd\Ext\UrlGenerator::class . '->expandParameters';

The hook gets called for every bracketed URL parameter value. $params['currentValue'] contains the value without brackets.

The code in the hook method only has to expand the value to a list of IDs and set that into $params['paramArray'][$key]:


 * @see    TYPO3\CMS\IndexedSearch\Example\CrawlerHook
 */
class UrlGenerator
{
    /**
     * Add GET parameters to crawler page.
     *
     * This method is registered as hook for crawler/class.tx_crawler_lib.php
     * and is called when crawler configuration "Configuration" fields
     * are expanded (`&L=[1-3]&bar=[FOO]`).
     *
     * @param array  $params Keys:
     *                       - &pObj
     *                       - ¶mArray
     *                       - currentKey
     *                       - currentValue
     *                       - pid
     * @param object $pObj   Crawler lib instance
     *
     * @return void
     */
    public function expandParameters(&$params, $pObj)
    {
        if ($params['currentValue'] === 'FOO') {
            //replace this with your own ID generation code
            $params['paramArray'][$params['currentKey']] = [11, 23, 42];
        }
    }
}
?>]]>

Now when the crawler processes page id 1 and finds a matching configuration record that contains the following configuration:

&tx_myparam=[FOO]

our hook will be called and expand that config to three IDs:

/index.php?id=1&tx_myparam=11
/index.php?id=1&tx_myparam=23
/index.php?id=1&tx_myparam=42

Crawler will then put three URLs into the queue and index them in the next run.

The page and the plugin that show the API data must be cachable. Data are not indexed otherwise. Also make sure you set the page title for indexing.

Enable cHash generation in the crawler configuration.

Cache invalidation

When a visitor uses the website search and indexed_search generates a search result set, it checks if the page ID is still available. Deactivated and deleted pages will thus not show up in the results. This does not work for API results for obvious reasons.

TYPO3 database records integrated into search with an indexed_search configuration get removed on the next crawler run. Until then, they are still findable:

In fact, if a record is removed its indexing entry will also be removed upon next indexing - simply because the "set_id" is used to finally clear out old entries after a re-index!

This works as follows:

  • The indexing configuration for records is executed and the records are indexed. Both config ID and set_id (timestamp of index config run) are saved with the search index records.
  • When the index configuration is run the next time, new search index entries will be created for all records - with a new set_id. There are two search index records for each page now.
  • Once the index configuration is fully processed (no URLs for that config ID in the queue anymore), the search index entries with the same index config ID and old set_ids are removed.

This also works for API data. The indexing configuration "pagetree" processes the API page ID, which in turn creates the API detail URLs through the crawler configuration. After reindexing the data, their old search index data get deleted.

The only thing to remember is not to use a "Crawler Queue" scheduler task, because then the phash records will have no index configuration ID, and thus will not be deleted on the next run.

Development

The "reset all index data" SQL script in invaluable during development:

TRUNCATE TABLE index_debug;
TRUNCATE TABLE index_fulltext;
TRUNCATE TABLE index_grlist;
TRUNCATE TABLE index_phash;
TRUNCATE TABLE index_rel;
TRUNCATE TABLE index_section;
TRUNCATE TABLE index_stat_search;
TRUNCATE TABLE index_stat_word;
TRUNCATE TABLE index_words;
TRUNCATE TABLE tx_crawler_process;
TRUNCATE TABLE tx_crawler_queue;
UPDATE index_config SET timer_next_indexing = 0;

Published on 2017-04-11 in , ,


Improving TYPO3 docker cache warming speed

Warming the page cache after a production deployment took up to two minutes for certain TYPO3 pages. We got that down to mere seconds by not throwing away scaled and cropped images.


🇩🇪 Eine deutsche Übersetzung dieses Artikels gibt es bei Mogic:
Docker: Schnelleres Cache-Warming für TYPO3

Cache cleaning

At work we use docker for our TYPO3 projects. Deploying changes to the live system only requires us to push into the main extension's master branch, and Jenkins will do the rest: Build the web server image with all the PHP code, pull that onto the production server, start up the new container, clear the cache and stop the old container.

Because potentially everything could have changed code-wise during deployments, we need to clear all the TYPO3 caches. Apart from the database cache tables, all files in typo3temp/ are pruned during deployment.

Responsive sites & focuspoint

Our TYPO3 projects have a responsive layout - it can be viewed in any resolution, and it will look good. Different resolutions and screen aspect ratios often need different images sizes and ratios - and those images need to be generated automatically.

To make sure that the important part of a picture is kept regardless of the targeted width-height-ratio, we utilize the focuspoint extension. Editors select the important part of the picture within the TYPO3 backend, and this part will be kept during image cropping.

Focus point selection

Mix that with different image resolutions for normal and high-density displays, and we're up to 6 images that need to be generated for a single image on the website (2 aspect ratios + 2 resolutions each).

When clearing typo3temp/, all those cropped and scaled images are thrown away and need to be regenerated. Calling pages with many images needed up to two minutes until they had all their images regenerated, which was just too much.

Processed files folder

Our goal was to keep the processed files. Their file names are a hash of the image processing configuration options, so they are stable over time. Cleaning caches has no effect on them.

Information about generated files are stored in the database as well, in table sys_file_processedfile. Since the database is kept during deployments, the contents of this table are also stable.

When re-using a production database dump on the test or dev system, TYPO3 notices if files have an entry in the processed files table but are missing on disk, and recreates them.

Solution

focuspoint saved the cropped files into typo3temp/focuscrop which was thrown away on deployments, so we made a patch to make that configurable.

With that in place we created a new folder in the site's document root, processed. It was "mounted" into TYPO3 with a new file storage record (uid: 2) that has its base path set to processed/ (path type "relative").

'processed' file storage settings Two file storages

The focuspoint extension was configured to store its generated files into processed/focuspoint

File storage fileadmin (auto-generated) was configured to store its "manipulated and temporary images" into 2:_processed_.

fileadmin setting

With those two changes, all generated images now land in the processed directory. We configured our docker container to mount the processed folder from the host, so that it would keep its data when new CMS containers are deployed.

docker-compose.yml
typo3cms:
  image: docker.example.com/project-typo3cms:latest
  volumes_from:
    - storage
  volumes:
    - ./semitemp/processed:/var/www/site/htdocs/processed

Fetching a page with over 200 images directly after deployment with empty caches now takes mere seconds instead of minutes. Mission accomplished.

Published on 2017-03-27 in , ,