One of my colleagues uses Docker Desktop on MacOS. One day, it reported that all of the 200GiB of reserved space were full.
The docker daemon did not think so:
$ docker system df
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 68 18 36GB 26.9GB (74%)
Containers 20 20 111.7MB 0B (0%)
Local Volumes 56 12 7.947GB 4.992GB (62%)
Build Cache 365 0 0B 0B
This are no 200 GiB.
On Linux we can simply inspect /var/lib/docker, but Docker on MacOS is running inside a virtual machine (VM). We got access to that VM with the help of justincormack/nsenter1:
$ docker run -it --rm --privileged --pid=host justincormack/nsenter1
# cd /var/lib/docker/
# du -h -d 1
44.5G ./overlay2
95.6M ./image
7.7G ./volumes
114.4G ./containers
336.0K ./network
166.7G ./var/lib/docker
So containers is very full although docker said it should contain only 111 MiB :)
# cd /var/lib/docker/containers/
# du -h -d 1
56.0K ./e239e861c2a9097f79d3c0c0f98fce7b20ab899674ea7772356d0355d9f688f4
44.0K ./962a1795b5440e9b9900a4743d86f9c18452675a43c379a88aaac2b16b5bd275
[...]
114G ./2f9867b08a714fcfd83b85d5cf7f883be05394970e7e0058747b399f06bd269d
[...]
# cd /var/lib/docker/containers/2f9867b08a714fcfd83b85d5cf7f883be05394970e7e0058747b399f06bd269d
# ls -lah
total 114G
[...]
-rw-r----- 1 root root 114.3G Oct 11 07:01 2f9867b08a714fcfd83b85d5cf7f883be05394970e7e0058747b399f06bd269d-json.log
[...]
And there we have it: A log file of 114 GiB.
But which container is that?
$ docker inspect 2f9867b08a714fcfd83b85d5cf7f883be05394970e7e0058747b399f06bd269d | jq '.[].Config.Image'
"gitlab-docker.company.com/customer/customer-wordpress-docker/mysql:latest"
This particular MySQL container seems to be very badly configured.
At work we have a small project in which we get CSS and JavaScript from an external agency. Now and then updates come in and have to be integrated into the existing code base. Sometimes things break, and all I have is minified JavaScript.
I wanted to be able to properly see the changes in the minified JavaScript files when using git diff, and it turns out you can configure Git to do that.
At first, install the JavaScript beautifier js-beautify via pip:
$ pip3 install jsbeautifier
Now we define a diff configuration with name minjs, which tells Git to prettify files with js-beautify:
$ git config --global diff.minjs.textconv js-beautify
If you have got enough space on disk, enable caching of the beautified files:
# takes extra space, but makes it faster:
$ git config --global diff.minjs.cachetextconv true
At last create a file called .gitattributes in your project root directory that tells Git to use the minjs diff configuration for file names ending with .min.js:
*.min.js diff=minjs
git diff now diffs minified JavaScript files in a readable way.
git show does not unless you use the --ext-diff option.
When running a shell in a docker container, you only see random hashes as hostname:
$ docker exec -it project_backend_1 bash
root@112adda3eb64:/#
Now imagine having a dozen of terminals open, and then you run ./vendor/bin/phpunit in container 71f68dcd5379. The first thing that the PHPUnit bootstrap script does is emptying the database and then running all migrations and seeds.
Unfortunately, you intendet to run that command in 112adda3eb64, your local development container. Let's just say that 71f68dcd5379 was not the the local dev one, but on a server in a data center, and the data thrown away were kind of important.
To prevent such mistakes in the future, the shell shall clearly show which environment you are in - local development, testing, staging or production.
This environment is available in our Laravel .env file, but it's not so easy to access in the terminal. So the first step is to add the current environment in the docker-compose.yml file:
---
version: "3"
services:
backend:
image: docker-hub.example.org/project/backend-dev:latest
environment:
- APP_ENV=local
Now we can access this variable in our shell via $APP_ENV.
The bash prompt $PS1 is set in two places in the Ubuntu 16.04 images that we used:
Both files define $PS1, so we have to load our bash-coloring file in both of them:
FROM ubuntu:xenial
ADD bash.colorprompt /etc/bash.colorprompt
RUN echo '. /etc/bash.colorprompt' >> /etc/bash.bashrc\
&& echo '. /etc/bash.colorprompt' >> /root/.bashrc
Now the only thing left is to write that file that sets the prompt:
# color the prompt according to $APP_ENV variable
case "$APP_ENV" in
production)
PS1='\e[41m\n=== $APP_ENV ===\e[m\n\u@\h:\w\$ '
;;
testing)
PS1='\e[43m$APP_ENV\e[m \u@\h:\w\$ '
;;
local)
PS1='\e[42m$APP_ENV\e[m \u@\h:\w\$ '
;;
esac
The obvious question is why PHPUnit was available on that system in the first place.
Our CI server runs unit/integration tests on every deployment, no matter which environment is being deployed to:
While this is in general a good idea, running the tests on the deployment to every environment is something we later stopped doing.
It turned out to be hard to make sure that every single configuration variable is overwritten in phpunit.xml. And if you can't be sure of this, your tests suddenly use some obscure production service that you forgot to stub out.
At work I lately was building a TYPO3 backend module to control some static HTML generating export script. I wanted the module to look native and sifted through the backend to find the UI elements I needed - which was cumbersome.
Thanks to the helpful people in the TYPO3 chat I was directed to the Styleguide extension.
It provides a list of all backend UI elements available in TYPO3 and was very helpful. It is important to install the git version, because the TER version was outdated.
Instead of writing our own search, we managed to integrate REST API data into a TYPO3's native indexed_search results. This brings us a mix of website content and REST data in one result list.
A TYPO3 v7.6 site at work consists of a normal page tree with content that is searchable with indexed_search.
A separate management interface is used by editors to administrate some domain-specific data outside of TYPO3. Those data are available via a REST API, which is utilized by one of our TYPO3 extensions to display data on the website.
Those externally managed data should now be searchable on the TYPO3 website.
I pondered a long time how to tackle this task. There were two approaches:
The second option looked easier at first because it does not require one to dig into indexed_search. But after thinking long enough I found that I would be replicating all the basic features needed for search: Listing data, paging, and those tabs as well.
The customer would then also demand that we'd have an overview page showing the first 3 results from each of the types, with a "view all" button.
In the end I decided to use option #1 because it would feel most integrated and would mean less code.
At first I have to recommend Indexed Search & Crawler - The Missing Manual because it explains many things and helps with basic setup.
You may create crawler configurations and indexed_search configurations in the TYPO3 page tree. Both are similar, yet different. How do they work together?
IS\CrawlerHook::crawler_execute_type4() gets an URL list via crawler_lib::getUrlsForPageRow().
Note that the crawler only processes entries that were in the queue when it started. Queue items added during the crawl run are not processed yet, but in a later run.
This means that it may take 6 or 7 crawler runs until it gets to your page with the indexing and crawler configuration. It's better to use the backend module Info -> Site crawler to enqueue your custom URLs during development, or have a minimal page tree with one page :)
Crawler configuration records are URL generators.
Without special configuration, they return the URL for a page ID. Pretty dull.
The crawler manual shows that they can be used for more, and gives a language configuration as example: &L=[1-3|5|7]. For each page ID this will generate 5 URLs, one for each of the listed languages 1, 2, 3, 5 and 7.
Apart from those value ranges, you may specify a _TABLE configuration :
&myparam=[_TABLE:tt_myext_items;_PID:15, _WHERE: and hidden = 0]
This is where we need to step in: We may handle those [FOO] values and expand them ourselves with a hook:
$GLOBALS['TYPO3_CONF_VARS']['SC_OPTIONS']['crawler/class.tx_crawler_lib.php']
['expandParameters'][] = \Vnd\Ext\UrlGenerator::class . '->expandParameters';
The hook gets called for every bracketed URL parameter value. $params['currentValue'] contains the value without brackets.
The code in the hook method only has to expand the value to a list of IDs and set that into $params['paramArray'][$key]:
* @see TYPO3\CMS\IndexedSearch\Example\CrawlerHook
*/
class UrlGenerator
{
/**
* Add GET parameters to crawler page.
*
* This method is registered as hook for crawler/class.tx_crawler_lib.php
* and is called when crawler configuration "Configuration" fields
* are expanded (`&L=[1-3]&bar=[FOO]`).
*
* @param array $params Keys:
* - &pObj
* - ¶mArray
* - currentKey
* - currentValue
* - pid
* @param object $pObj Crawler lib instance
*
* @return void
*/
public function expandParameters(&$params, $pObj)
{
if ($params['currentValue'] === 'FOO') {
//replace this with your own ID generation code
$params['paramArray'][$params['currentKey']] = [11, 23, 42];
}
}
}
?>]]>
Now when the crawler processes page id 1 and finds a matching configuration record that contains the following configuration:
&tx_myparam=[FOO]
our hook will be called and expand that config to three IDs:
/index.php?id=1&tx_myparam=11 /index.php?id=1&tx_myparam=23 /index.php?id=1&tx_myparam=42
Crawler will then put three URLs into the queue and index them in the next run.
The page and the plugin that show the API data must be cachable. Data are not indexed otherwise. Also make sure you set the page title for indexing.
Enable cHash generation in the crawler configuration.
When a visitor uses the website search and indexed_search generates a search result set, it checks if the page ID is still available. Deactivated and deleted pages will thus not show up in the results. This does not work for API results for obvious reasons.
TYPO3 database records integrated into search with an indexed_search configuration get removed on the next crawler run. Until then, they are still findable:
In fact, if a record is removed its indexing entry will also be removed upon next indexing - simply because the "set_id" is used to finally clear out old entries after a re-index!
This works as follows:
This also works for API data. The indexing configuration "pagetree" processes the API page ID, which in turn creates the API detail URLs through the crawler configuration. After reindexing the data, their old search index data get deleted.
The only thing to remember is not to use a "Crawler Queue" scheduler task, because then the phash records will have no index configuration ID, and thus will not be deleted on the next run.
The "reset all index data" SQL script in invaluable during development:
TRUNCATE TABLE index_debug;
TRUNCATE TABLE index_fulltext;
TRUNCATE TABLE index_grlist;
TRUNCATE TABLE index_phash;
TRUNCATE TABLE index_rel;
TRUNCATE TABLE index_section;
TRUNCATE TABLE index_stat_search;
TRUNCATE TABLE index_stat_word;
TRUNCATE TABLE index_words;
TRUNCATE TABLE tx_crawler_process;
TRUNCATE TABLE tx_crawler_queue;
UPDATE index_config SET timer_next_indexing = 0;
Warming the page cache after a production deployment took up to two minutes for certain TYPO3 pages. We got that down to mere seconds by not throwing away scaled and cropped images.
🇩🇪
Eine deutsche Übersetzung dieses Artikels gibt es bei Mogic:
Docker: Schnelleres Cache-Warming für TYPO3
At work we use docker for our TYPO3 projects. Deploying changes to the live system only requires us to push into the main extension's master branch, and Jenkins will do the rest: Build the web server image with all the PHP code, pull that onto the production server, start up the new container, clear the cache and stop the old container.
Because potentially everything could have changed code-wise during deployments, we need to clear all the TYPO3 caches. Apart from the database cache tables, all files in typo3temp/ are pruned during deployment.
Our TYPO3 projects have a responsive layout - it can be viewed in any resolution, and it will look good. Different resolutions and screen aspect ratios often need different images sizes and ratios - and those images need to be generated automatically.
To make sure that the important part of a picture is kept regardless of the targeted width-height-ratio, we utilize the focuspoint extension. Editors select the important part of the picture within the TYPO3 backend, and this part will be kept during image cropping.
Mix that with different image resolutions for normal and high-density displays, and we're up to 6 images that need to be generated for a single image on the website (2 aspect ratios + 2 resolutions each).
When clearing typo3temp/, all those cropped and scaled images are thrown away and need to be regenerated. Calling pages with many images needed up to two minutes until they had all their images regenerated, which was just too much.
Our goal was to keep the processed files. Their file names are a hash of the image processing configuration options, so they are stable over time. Cleaning caches has no effect on them.
Information about generated files are stored in the database as well, in table sys_file_processedfile. Since the database is kept during deployments, the contents of this table are also stable.
When re-using a production database dump on the test or dev system, TYPO3 notices if files have an entry in the processed files table but are missing on disk, and recreates them.
focuspoint saved the cropped files into typo3temp/focuscrop which was thrown away on deployments, so we made a patch to make that configurable.
With that in place we created a new folder in the site's document root, processed. It was "mounted" into TYPO3 with a new file storage record (uid: 2) that has its base path set to processed/ (path type "relative").
The focuspoint extension was configured to store its generated files into processed/focuspoint
File storage fileadmin (auto-generated)
was configured to store its
"manipulated and temporary images" into 2:_processed_.
With those two changes, all generated images now land in the processed directory. We configured our docker container to mount the processed folder from the host, so that it would keep its data when new CMS containers are deployed.
typo3cms:
image: docker.example.com/project-typo3cms:latest
volumes_from:
- storage
volumes:
- ./semitemp/processed:/var/www/site/htdocs/processed
Fetching a page with over 200 images directly after deployment with empty caches now takes mere seconds instead of minutes. Mission accomplished.
Requests to the Telegram messaging API on a docker container at work took 5 seconds:
$ time curl --silent api.telegram.org --output /dev/null
real 0m5.577s
Doing a IPv4-only request was quick:
$ time curl -4 --silent api.telegram.org --output /dev/null
real 0m0.090s
Inspecting the network traffic with wireshark shows that two DNS are made: One for the IPv4 address, one for the IPv6 address.
The IPv4 address is immediately resolved, but the IPv6 request is cancelled after 5 seconds:
A thread on askubuntu.com gave me the hint what to do: Disable parallel DNS requests, so that the IPv4 request is sent first. Only if that fails, the IPv6 request will be made.
options single-request
At work I needed to test a locally installed IMAP server. Knowing the cURL is not only for HTTP but for a dozen other protocols as well - including IMAP and SMTP - I decided to give it a try.
debian-administration.org has a nice article about using curl for IMAP, which is where I got the commands from.
Trying to list the IMAP account's folders gave me an error:
$ curl -v imap://localhost --user "user11@example.org:user11"
* Rebuilt URL to: imap://localhost/
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 143 (#0)
< * OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE STARTTLS LOGINDISABLED] Dovecot ready.
> A001 CAPABILITY
< * CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE STARTTLS LOGINDISABLED
< A001 OK Pre-login capabilities listed, post-login capabilities have more.
* No known authentication mechanisms supported!
* Closing connection 0
curl: (67) Login denied
Looking at the capability line we see that login is disabled until the STARTTLS command is issued. curl does not do that, though - we need to force it by using the --ssl option:
$ curl -v -k --ssl imap://localhost --user "user11@example.org:user11"
* Rebuilt URL to: imap://localhost/
* Trying 127.0.0.1...
* Connected to localhost (127.0.0.1) port 143 (#0)
< * OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE
IDLE STARTTLS LOGINDISABLED] Dovecot ready.
> A001 CAPABILITY
< * CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE
STARTTLS LOGINDISABLED
< A001 OK Pre-login capabilities listed, post-login capabilities have more.
> A002 STARTTLS
< A002 OK Begin TLS negotiation now.
I spent the last couple of days at work integrating REST API data into the search result list of TYPO3's indexed_search extension. Yesterday I wanted to run a last test on my development machine to see if everything worked as it should and if API data would be indexed correctly. It did not work.
Why didn't it work?
After several hours I found out that the crawler extension did indeed process the page with my special crawling configuration, but stops in the middle.
Why did it stop?
The crawler catches an exception and stops processing. Unfortunately it did not tell anyone about that. The exception was "HTTP/1.1 404 Not Found", from the API connector.
Why did the API connector throw an exception?
Our crawler hook thought it was running on the live (production) system and queried the production API. The new API methods had not yet been deployed to the production API system, and it returned a 404.
Why did crawler think we are on prod?
The docker container has an environment variable TYPO3_CONTEXT=Development, which tells the TYPO3 instance to use the development configuration. That variable was not set.
Why was the environment variable not set?
To make the crawler process run correctly (write access to temporary directories + files), it must be run as the same user that the nginx web server runs under, www-data. I switched to the www-data user as I always do:
$ su - www-data -s /bin/bash
The - resets all environment variables. TYPO3_CONTEXT was thus not set anymore.
After 6 hours, I removed that minus and everything worked as it should.
During development of the TYPO3-based Wohnglück project, fellow developers experienced the dreaded "white pages" when accessing a certain page in the CMS. This happened on production, test and local dev environments - but only now and then and not reproducible.
Our log server only showed:
nginx stdout | 2017/02/02 11:26:55 [error] 24#24: *40 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 42.0.23.0, server: _, request: "GET /some/path/ HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "some.host", referrer: "https://some.host/other/path/"
I was totally certain that this must be PHP crashing. We were running php version 7.0.8ubuntu*, but the most recent one was 7.1.15. Upgrading would surely make the crash go away.
Another colleague did not want to go down that route and suggested to collect more error information, which we then did by setting the FPM log_level setting to warning (it was error before). After that we waited until it happened again a day later. The log was more verbose now:
php7.0-fpm stderr | WARNING: [pool www] child 27, script '/var/www/site/htdocs/index.php' (request: "GET /index.php") execution timed out (152.506891 sec), terminating php7.0-fpm stderr | WARNING: [pool www] child 27 exited on signal 15 (SIGTERM) after 240.011067 seconds from start nginx stdout | 24#24: *40 recv() failed (104: Connection reset by peer) while reading response header from upstream, ...
So PHP did not crash, it was killed by php-fpm because the process took longer than 4 minutes to run!
The log also contained error messages about TYPO3 calling graphicsmagick but failing whith strange errors like "invalid JPEG header data" and "no information read", which I could not make sense of before.
But now it actually did make sense: The page contained a whopping 219 images, which got lazy-loaded by the browser, but all had to be scaled and cropped during page generation. TYPO3's whole cache is cleared during our automated docker deployment, and the first persons accessing the page experienced that problem.
Multiple people accessing that page at the same time did also explain the gm command errors: Two PHP processes tried to generate the same files at once and one read the partly generated data by the other process.
The solution to this problem is to not throw away the scaled images during deployment.