TYPO3 v9 introduced the
Page Title API
that should be used now.
This blog post is obsolete.
At work I had a web site that showed records on a listing page, and offered more information for each records on a detail page. The task was now to change the page title on the detail page to the record's own title.
The naive solution is to simply set the page title in the central frontend output object:
$GLOBALS['TSFE']->page['title'] = 'foo';
But this does not work on uncached plugins.
To understand why, we need to look how TYPO3's cache works together with uncached plugins:
Page <head> and <body> is generated and combined to a single string of HTML.
Uncacheable plugins are not executed yet; a placeholder is added instead:
some html..<!--INT_SCRIPT.abcdef-->more html
Additional placeholders are added for additionalHeaderData and additionalFooterData.
In TypoScriptFrontendController::INTincScript(), TYPO3 iterates over all plugin placeholders, executes the respective plugin code and replaces the placeholder with the plugin output
It also replaces additional*Data placeholders with their values from $GLOBALS['TSFE'].
When the user requests a cached page, only the last two steps 3 and 4 are executed. Thus there is no way to change the page title generated with TypoScript.
There are three possible solutions to set the page title from an uncached plugin:
I suggest option 1.
This is the option I recommend: It works with both cached and uncached plugins, and it keeps your code in one place.
At first, disable the creation of the normal title tag via config.noPageTitle for the pages that contain the plugin:
[globalVar = TSFE:id = 23|42]
config.noPageTitle = 2
[global]
In your plugin's logic, add the page title to TSFE's additionalHeaderData:
additionalHeaderData['myCustomUserIntTitle']
= '' . $this->getTitle($newTitle) . ' ';]]>
That's all needed.
Other people recommending this solution:
When a cached plugin is processed, the cached HTML code is available in $GLOBALS['TSFE']->content. You might be tempted to simply modify it during plugin processing..
This works for uncached plugins only. In cached plugins, $content is not filled and changing it does not do anything since it gets overwritten later.
content = preg_replace(
'#.*<\/title>#',
'' . htmlspecialchars($newTitle) . ' ',
$GLOBALS['TSFE']->content
);]]>
Some people recommend this:
TYPO3 allows you to register a hook that gets executed just before the content is sent to the user. Just as in option #2 you can search and replace on the HTML:
$GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['tslib/class.tslib_fe.php']['contentPostProc-output']['robots'] = \Vnd\Class::class . '::contentPostProc';
And now you can preg_replace your new title into the HTML:
content = preg_replace(
'#.*<\/title>#',
'' . htmlspecialchars($newTitle) . ' ',
$pObj->content
);
}]]>
This does work for cached and uncached plugins.
The downside is that your title creation and title insertion code are in separate places now (plugin rendering vs. postproc-hook).
vhs has a <v:page.header.title> view helper that only works for cached plugins.
indexed_search will still not show the correct page
title in its search results.
$GLOBALS['TSFE']->indexedDocTitle
needs to be set
to the page title, too.
At work we're mainly using fluidcontent to create custom TYPO3 content elements. It allows us to define configuration, preview and rendering in a single XML file.
One of our content elements allowed the selection of a page record, which would be linked to in the frontend rendering. In the backend's page preview, I wanted to show the ID and the title of that linked page.
It turns out that there are no Fluid view helpers that provide the title for a page UID, nor does VHS provide such a thing (v:resource.record is only usable for FAL relations, and v:page.info does not work in the backend)
Instead of writing my own view helper class, I misused v:page.rootline to obtain the page title:
{v:page.rootline(pageUid: pageUid) -> v:iterator.first() -> v:variable.set(name: 'page')}
Page title: {page.title}
If your TYPO3 instance offers an Atom feed (or RSS), you should link to it in the HTML page <head> to enable feed autodiscovery by feed readers and podcast clients.
In your extension, add a feed page ID setting to the TypoScript constants:
#cat=project/links; type=int+; label=Podcast index page
pids.podcast = 61
In the TypoScript that generates the page, add the header as follows:
page = PAGE
page {
headerData {
30 = TEXT
30 {
typolink {
parameter = t3://page?uid={$pids.podcast}&type=6
returnLast = url
forceAbsoluteUrl = 1
}
wrap = <link rel="alternate" type="application/atom+xml" title="Podcast" href="http://cweiske.de/tagebuch/|"/>
}
}
}
Note that we use page type 6 for atom feed output here, which is also configured via TypoScript.
The example here works fine with TYPO3 v8.
At work I lately was building a TYPO3 backend module to control some static HTML generating export script. I wanted the module to look native and sifted through the backend to find the UI elements I needed - which was cumbersome.
Thanks to the helpful people in the TYPO3 chat I was directed to the Styleguide extension.
It provides a list of all backend UI elements available in TYPO3 and was very helpful. It is important to install the git version, because the TER version was outdated.
System folders in TYPO3 are often used to collect data of a certain type. It often makes sense to limit the type of records that can be inserted on the page: It makes data authoring easier for editors, since there are not 100 record types to choose from. And it helps keeping the folder clean of accidential rogue database records.
To enable only certain records on a page, edit the page settings and go to tab "Resources". Add the following "Page TS Config" and add a allowedNewTables setting:
Now, only OSM markers, tracks and vectors may be created on that page:
Apart from PageTSconfig, you may use the global PAGES_TYPES array to limit the allowed record types on a certain page type:
$GLOBALS['PAGES_TYPES'][$categoryDoktype]['allowedTables']
= 'pages,tt_content,sys_file_reference,sys_template';
This works in at least TYPO3 v7.
Writing a TYPO3 extension, I needed to allow records of a certain table to be created on TYPO3's root page (pid=0). While the solution is easy, it took me a while to find it.
In ext_tables.php, add a rootLevel setting to your table's ctrl array:
$TCA['static_countries'] = array( 'ctrl' => array( 'title' => 'Countries', 'label' => 'cn_short_en', 'rootLevel' => 1, ...
TYPO3 allows three settings here: 0, 1 and -1, which are described in the Core API documentation:
Also see TYPO3: Limit record types on a page.
Instead of writing our own search, we managed to integrate REST API data into a TYPO3's native indexed_search results. This brings us a mix of website content and REST data in one result list.
A TYPO3 v7.6 site at work consists of a normal page tree with content that is searchable with indexed_search.
A separate management interface is used by editors to administrate some domain-specific data outside of TYPO3. Those data are available via a REST API, which is utilized by one of our TYPO3 extensions to display data on the website.
Those externally managed data should now be searchable on the TYPO3 website.
I pondered a long time how to tackle this task. There were two approaches:
The second option looked easier at first because it does not require one to dig into indexed_search. But after thinking long enough I found that I would be replicating all the basic features needed for search: Listing data, paging, and those tabs as well.
The customer would then also demand that we'd have an overview page showing the first 3 results from each of the types, with a "view all" button.
In the end I decided to use option #1 because it would feel most integrated and would mean less code.
At first I have to recommend Indexed Search & Crawler - The Missing Manual because it explains many things and helps with basic setup.
You may create crawler configurations and indexed_search configurations in the TYPO3 page tree. Both are similar, yet different. How do they work together?
IS\CrawlerHook::crawler_execute_type4() gets an URL list via crawler_lib::getUrlsForPageRow().
Note that the crawler only processes entries that were in the queue when it started. Queue items added during the crawl run are not processed yet, but in a later run.
This means that it may take 6 or 7 crawler runs until it gets to your page with the indexing and crawler configuration. It's better to use the backend module Info -> Site crawler to enqueue your custom URLs during development, or have a minimal page tree with one page :)
Crawler configuration records are URL generators.
Without special configuration, they return the URL for a page ID. Pretty dull.
The crawler manual shows that they can be used for more, and gives a language configuration as example: &L=[1-3|5|7]. For each page ID this will generate 5 URLs, one for each of the listed languages 1, 2, 3, 5 and 7.
Apart from those value ranges, you may specify a _TABLE configuration :
&myparam=[_TABLE:tt_myext_items;_PID:15, _WHERE: and hidden = 0]
This is where we need to step in: We may handle those [FOO] values and expand them ourselves with a hook:
$GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['crawler/class.tx_crawler_lib.php']
['expandParameters'][] = \Vnd\Ext\UrlGenerator::class . '->expandParameters';
The hook gets called for every bracketed URL parameter value. $params['currentValue'] contains the value without brackets.
The code in the hook method only has to expand the value to a list of IDs and set that into $params['paramArray'][$key]:
* @see TYPO3\CMS\IndexedSearch\Example\CrawlerHook
*/
class UrlGenerator
{
/**
* Add GET parameters to crawler page.
*
* This method is registered as hook for crawler/class.tx_crawler_lib.php
* and is called when crawler configuration "Configuration" fields
* are expanded (`&L=[1-3]&bar=[FOO]`).
*
* @param array $params Keys:
* - &pObj
* - ¶mArray
* - currentKey
* - currentValue
* - pid
* @param object $pObj Crawler lib instance
*
* @return void
*/
public function expandParameters(&$params, $pObj)
{
if ($params['currentValue'] === 'FOO') {
//replace this with your own ID generation code
$params['paramArray'][$params['currentKey']] = [11, 23, 42];
}
}
}
?>]]>
Now when the crawler processes page id 1 and finds a matching configuration record that contains the following configuration:
&tx_myparam=[FOO]
our hook will be called and expand that config to three IDs:
/index.php?id=1&tx_myparam=11 /index.php?id=1&tx_myparam=23 /index.php?id=1&tx_myparam=42
Crawler will then put three URLs into the queue and index them in the next run.
The page and the plugin that show the API data must be cachable. Data are not indexed otherwise. Also make sure you set the page title for indexing.
Enable cHash generation in the crawler configuration.
When a visitor uses the website search and indexed_search generates a search result set, it checks if the page ID is still available. Deactivated and deleted pages will thus not show up in the results. This does not work for API results for obvious reasons.
TYPO3 database records integrated into search with an indexed_search configuration get removed on the next crawler run. Until then, they are still findable:
In fact, if a record is removed its indexing entry will also be removed upon next indexing - simply because the "set_id" is used to finally clear out old entries after a re-index!
This works as follows:
This also works for API data. The indexing configuration "pagetree" processes the API page ID, which in turn creates the API detail URLs through the crawler configuration. After reindexing the data, their old search index data get deleted.
The only thing to remember is not to use a "Crawler Queue" scheduler task, because then the phash records will have no index configuration ID, and thus will not be deleted on the next run.
The "reset all index data" SQL script in invaluable during development:
TRUNCATE TABLE index_debug;
TRUNCATE TABLE index_fulltext;
TRUNCATE TABLE index_grlist;
TRUNCATE TABLE index_phash;
TRUNCATE TABLE index_rel;
TRUNCATE TABLE index_section;
TRUNCATE TABLE index_stat_search;
TRUNCATE TABLE index_stat_word;
TRUNCATE TABLE index_words;
TRUNCATE TABLE tx_crawler_process;
TRUNCATE TABLE tx_crawler_queue;
UPDATE index_config SET timer_next_indexing = 0;
Warming the page cache after a production deployment took up to two minutes for certain TYPO3 pages. We got that down to mere seconds by not throwing away scaled and cropped images.
🇩🇪
Eine deutsche Übersetzung dieses Artikels gibt es bei Mogic:
Docker: Schnelleres Cache-Warming für TYPO3
At work we use docker for our TYPO3 projects. Deploying changes to the live system only requires us to push into the main extension's master branch, and Jenkins will do the rest: Build the web server image with all the PHP code, pull that onto the production server, start up the new container, clear the cache and stop the old container.
Because potentially everything could have changed code-wise during deployments, we need to clear all the TYPO3 caches. Apart from the database cache tables, all files in typo3temp/ are pruned during deployment.
Our TYPO3 projects have a responsive layout - it can be viewed in any resolution, and it will look good. Different resolutions and screen aspect ratios often need different images sizes and ratios - and those images need to be generated automatically.
To make sure that the important part of a picture is kept regardless of the targeted width-height-ratio, we utilize the focuspoint extension. Editors select the important part of the picture within the TYPO3 backend, and this part will be kept during image cropping.
Mix that with different image resolutions for normal and high-density displays, and we're up to 6 images that need to be generated for a single image on the website (2 aspect ratios + 2 resolutions each).
When clearing typo3temp/, all those cropped and scaled images are thrown away and need to be regenerated. Calling pages with many images needed up to two minutes until they had all their images regenerated, which was just too much.
Our goal was to keep the processed files. Their file names are a hash of the image processing configuration options, so they are stable over time. Cleaning caches has no effect on them.
Information about generated files are stored in the database as well, in table sys_file_processedfile. Since the database is kept during deployments, the contents of this table are also stable.
When re-using a production database dump on the test or dev system, TYPO3 notices if files have an entry in the processed files table but are missing on disk, and recreates them.
focuspoint saved the cropped files into typo3temp/focuscrop which was thrown away on deployments, so we made a patch to make that configurable.
With that in place we created a new folder in the site's document root, processed. It was "mounted" into TYPO3 with a new file storage record (uid: 2) that has its base path set to processed/ (path type "relative").
The focuspoint extension was configured to store its generated files into processed/focuspoint
File storage fileadmin (auto-generated)
was configured to store its
"manipulated and temporary images" into 2:_processed_.
With those two changes, all generated images now land in the processed directory. We configured our docker container to mount the processed folder from the host, so that it would keep its data when new CMS containers are deployed.
typo3cms:
image: docker.example.com/project-typo3cms:latest
volumes_from:
- storage
volumes:
- ./semitemp/processed:/var/www/site/htdocs/processed
Fetching a page with over 200 images directly after deployment with empty caches now takes mere seconds instead of minutes. Mission accomplished.
I spent the last couple of days at work integrating REST API data into the search result list of TYPO3's indexed_search extension. Yesterday I wanted to run a last test on my development machine to see if everything worked as it should and if API data would be indexed correctly. It did not work.
Why didn't it work?
After several hours I found out that the crawler extension did indeed process the page with my special crawling configuration, but stops in the middle.
Why did it stop?
The crawler catches an exception and stops processing. Unfortunately it did not tell anyone about that. The exception was "HTTP/1.1 404 Not Found", from the API connector.
Why did the API connector throw an exception?
Our crawler hook thought it was running on the live (production) system and queried the production API. The new API methods had not yet been deployed to the production API system, and it returned a 404.
Why did crawler think we are on prod?
The docker container has an environment variable TYPO3_CONTEXT=Development, which tells the TYPO3 instance to use the development configuration. That variable was not set.
Why was the environment variable not set?
To make the crawler process run correctly (write access to temporary directories + files), it must be run as the same user that the nginx web server runs under, www-data. I switched to the www-data user as I always do:
$ su - www-data -s /bin/bash
The - resets all environment variables. TYPO3_CONTEXT was thus not set anymore.
After 6 hours, I removed that minus and everything worked as it should.
During development of the TYPO3-based Wohnglück project, fellow developers experienced the dreaded "white pages" when accessing a certain page in the CMS. This happened on production, test and local dev environments - but only now and then and not reproducible.
Our log server only showed:
nginx stdout | 2017/02/02 11:26:55 [error] 24#24: *40 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 42.0.23.0, server: _, request: "GET /some/path/ HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "some.host", referrer: "https://some.host/other/path/"
I was totally certain that this must be PHP crashing. We were running php version 7.0.8ubuntu*, but the most recent one was 7.1.15. Upgrading would surely make the crash go away.
Another colleague did not want to go down that route and suggested to collect more error information, which we then did by setting the FPM log_level setting to warning (it was error before). After that we waited until it happened again a day later. The log was more verbose now:
php7.0-fpm stderr | WARNING: [pool www] child 27, script '/var/www/site/htdocs/index.php' (request: "GET /index.php") execution timed out (152.506891 sec), terminating php7.0-fpm stderr | WARNING: [pool www] child 27 exited on signal 15 (SIGTERM) after 240.011067 seconds from start nginx stdout | 24#24: *40 recv() failed (104: Connection reset by peer) while reading response header from upstream, ...
So PHP did not crash, it was killed by php-fpm because the process took longer than 4 minutes to run!
The log also contained error messages about TYPO3 calling graphicsmagick but failing whith strange errors like "invalid JPEG header data" and "no information read", which I could not make sense of before.
But now it actually did make sense: The page contained a whopping 219 images, which got lazy-loaded by the browser, but all had to be scaled and cropped during page generation. TYPO3's whole cache is cleared during our automated docker deployment, and the first persons accessing the page experienced that problem.
Multiple people accessing that page at the same time did also explain the gm command errors: Two PHP processes tried to generate the same files at once and one read the partly generated data by the other process.
The solution to this problem is to not throw away the scaled images during deployment.