Sometimes I download images from The Internet™ for later use. For reference I'd like to store some meta data inside the image itself:
The question is now: Which meta data field should I use for those URLs?
Basic meta data can be stored in the image's EXIF data, but there is no URL field:
$ exiftool -list -EXIF:All|grep -i url
$
The Exif 2.3 metadata for XMP document also does not list a single URL field, so exiftool was right.
The XMP standard is another way to store meta data in files. XMP Specification Part 1 defines multiple fields that one could use:
Property | Type | Description |
---|---|---|
dc:relation | Unordered array of Text | A related resource. |
dc:source | Text | A related resource from which the described resource is derived. |
Unfortunately there are no "real" URL fields.
There is another vocabulary, the IPTC Photo Metadata Standard.
Property | Type | Description |
---|---|---|
Iptc4xmpCore:Source | Text | The name of a person or party who has a role in the content supply chain. |
Iptc4xmpCore:CreatorContactInfoCiUrlWork | URL, multiple |
The creator's contact information provides all necessary information to get in contact with the creator of this item and comprises a set of sub-properties for proper addressing. The contact information web address part. Multiple addresses can be given, separated by a comma. |
plus:ImageSupplier | Seq ImageSupplierDetail | Identifies the most recent supplier of the item, who is not necessarily its owner or creator. For identifying the supplier either a well known and/or registered company name or a URL of the company's web site may be used. |
Only Iptc4xmpCore:CreatorContactInfoCiUrlWork sounds like it could be used to identify the web site that linked to the image, but I think it is meant to directly link to the creator's homepage - and not to a random URL that just contains a image tag.
The Metadata Working Group published the Guidelines for Handling Image Metadata spec in 2010, and it contains a tag that actually matches my idea of "URL of website that linked to the image":
Property | Type | Description |
---|---|---|
mwg-coll:CollectionURI | URI | URI describing the collection resource. |
A "collection" in MWG speak is a group of images that this specific image is part of. And a website that links to the image can be seen as such a group.
I'm not satisfied with the available properties I found. But instead of inventing my own namespace with source and website properties, I'll simply use the Dublin Core XMP properties:
Property | Usage |
---|---|
|
URL of image that was downloaded |
dc:relation | URL of website that linked to the image |
Let's say that I visited http://cweiske.de/bdrem.htm and downloaded the image http://cweiske.de/graphics/bdrem/html.png. Now I want to add the website URL and image URL to its meta data.
Embedding the URLs in the downloaded image is easy with exiftool:
$ wget http://cweiske.de/graphics/bdrem/html.png -O bdrem-html.png
$ exiftool -source=http://cweiske.de/graphics/bdrem/html.png\
-relation=http://cweiske.de/bdrem.htm bdrem-html.png
Warning: [minor] IPTC:Source exceeds length limit (truncated)
1 image files updated
Despite the warning, the full source URL is stored in the image file. But on JPG files the source is really truncated:
$ exiftool -S -source -relation bdrem-html.jpg
Source: http://cweiske.de/graphics/bdrem
Relation: http://cweiske.de/bdrem.htm
To work around this issue, we force exiftool to use the XMP source property instead of the IPTC source property:
$ exiftool -XMP:source=http://cweiske.de/graphics/bdrem/html.png\
-XMP:relation=http://cweiske.de/bdrem.htm bdrem-html.jpg
1 image files updated
Extracting the data is also possible:
$ exiftool bdrem-html.png
ExifTool Version Number : 9.46
File Name : bdrem-html.png
...
MIME Type : image/png
Image Width : 463
Image Height : 122
...
Software : Shutter
Source : http://cweiske.de/graphics/bdrem/html.png
XMP Toolkit : Image::ExifTool 9.46
Relation : http://cweiske.de/bdrem.htm
Image Size : 463x122
$ exiftool -S -source -relation bdrem-html.png
Source: http://cweiske.de/graphics/bdrem/html.png
Relation: http://cweiske.de/bdrem.htm
Adding the meta data manually is possible, but it would be best if they were
added automatically when saving-by-right-clicking the image in my brower.
Unfortunately, no browser supports this.
On MacOS, downloaded files have the download source in the "Where from" file information. Safari, Chrome and Firefox (bug, commit) support this.
It is stored as extended attribute com.apple.metadata:kMDItemWhereFroms in the file system, so it is not tied to the file itself (but also does not modify the file, and works for all types of files).
2024-01: Kelvin Thompson sent me an e-mail explaining that exiftool allows to access this attribute and copy it into a different tag in the file itself:
$ exiftool '-XMP:source<MDItemWhereFroms' filename.jpg
The XDG defines a list of Common Extended Attributes, among them is user.xdg.origin.url. It shall be used as extended file system attribute, similar to what MacOS does.
curl supports writing this file system attribute:
$ curl --xattr --output html.png http://cweiske.de/graphics/bdrem/html.png
$ getfattr --dump html.png
# file: html.png
user.mime_type="image/png"
user.xdg.origin.url="http://cweiske.de/graphics/bdrem/html.png"
Chromium once supported it, but
removed it in 2019
because
metadata doesn't provide any security guarantees on Linux, and is a privacy risk
.
wget, just like curl, supports the --xattr option:
$ wget --xattr http://cweiske.de/graphics/bdrem/html.png
$ getfattr --dump html.png
# file: html.png
user.xdg.origin.url="http://cweiske.de/graphics/bdrem/html.png"
Firefox has a feature request open since 2011 for supporting user.xdg.origin.url.
But it does write the origin URL to gnome gvfs meta data:
$ gio info --attributes=metadata:: html.png
[...]
attributes:
metadata::download-uri: http://cweiske.de/graphics/bdrem/html.png
Daniel Aleksandersen suggested that we already have a property that means "URL of the thing we talk about", and that is rdf:about:
[...] goes on to identify the resource the statement is about (the subject of the statement) using the rdf:about attribute to specify the URIref of the subject resource.
Today I had (again) the problem that not all programs support EXIF orientation in JPG photos and don't rotate them automatically. Although Digikam is supposed to reset rotation automatically when downloading the images, not all are going through that process and are copied directly from other sources.
With the help of exiv2 and convert I was able to write a small php script that takes a number of files as parameters and rotates them according to their EXIF orientation settings. After rotation is done, exif orientation is reset to 1.
You can find the source at svn.cweiske.de.
While looking for more EXIF information I found some interesting articles about hidden data in images through exif:
On Gentoo, media-libs/jpeg ships a tool called exifautotran ... which does exactly what I want.
exifautotran does not do exactly what I want, which is why I continue to use my own script. You can now install it using PEAR:
$ pear channel-discover zustellzentrum.cweiske.de $ pear install zz/exifrotator-beta $ exif-rotator.php /path/to/your/images/*.jpg
When archiving photos on a backup disk, you eventually will loose all locally stored thumbnail images. They are located in ~/.thumbnails/ and get cleaned up now and then. When accessing the pictures the next time, your file manager has a hard time re-creating all the thumbnails again - which is something you personally don't want eiher, since the time it takes to scale those images is lost time.
Fortunately, the freedesktop.org Thumbnail Specification (mirror) also describes "Local Thumbnail repositories" which may be created per-directory in $folder/.thumblocal/. Now eix did not show me any thumbnail creator, so I was tempted to create one myself. Luckily I searched the web for .thumblocal and found updateThumbnails, a thumbnail creation program.
The tool supports the freedesktop specification and some other ones (XV and size-based directories [like 800x600/, 640x840/ etc])! Beside creating thumbnails, it also keeps the files up to date. What would I want more?