Sometimes I download images from The Internet™ for later use. For reference I'd like to store some meta data inside the image itself:
The question is now: Which meta data field should I use for those URLs?
Finding the right property
EXIF
Basic meta data can be stored in the image's EXIF data, but there is no URL field:
$ exiftool -list -EXIF:All|grep -i url $
The Exif 2.3 metadata for XMP document also does not list a single URL field, so exiftool was right.
XMP
The XMP standard is another way to store meta data in files. XMP Specification Part 1 defines multiple fields that one could use:
Property | Type | Description |
---|---|---|
dc:relation | Unordered array of Text | A related resource. |
dc:source | Text | A related resource from which the described resource is derived. |
Unfortunately there are no "real" URL fields.
IPTC
There is another vocabulary, the IPTC Photo Metadata Standard.
Property | Type | Description |
---|---|---|
Iptc4xmpCore:Source | Text | The name of a person or party who has a role in the content supply chain. |
Iptc4xmpCore:CreatorContactInfoCiUrlWork | URL, multiple |
The creator's contact information provides all necessary information to get in contact with the creator of this item and comprises a set of sub-properties for proper addressing. The contact information web address part. Multiple addresses can be given, separated by a comma. |
plus:ImageSupplier | Seq ImageSupplierDetail | Identifies the most recent supplier of the item, who is not necessarily its owner or creator. For identifying the supplier either a well known and/or registered company name or a URL of the company's web site may be used. |
Only Iptc4xmpCore:CreatorContactInfoCiUrlWork sounds like it could be used to identify the web site that linked to the image, but I think it is meant to directly link to the creator's homepage - and not to a random URL that just contains a image tag.
Metadata Working Group
The Metadata Working Group published the Guidelines for Handling Image Metadata spec in 2010, and it contains a tag that actually matches my idea of "URL of website that linked to the image":
Property | Type | Description |
---|---|---|
mwg-coll:CollectionURI | URI | URI describing the collection resource. |
A "collection" in MWG speak is a group of images that this specific image is part of. And a website that links to the image can be seen as such a group.
Conclusion
I'm not satisfied with the available properties I found. But instead of inventing my own namespace with source and website properties, I'll simply use the Dublin Core XMP properties:
Property | Usage |
---|---|
|
URL of image that was downloaded |
dc:relation | URL of website that linked to the image |
exiftool
Let's say that I visited http://cweiske.de/bdrem.htm and downloaded the image http://cweiske.de/graphics/bdrem/html.png. Now I want to add the website URL and image URL to its meta data.
Embedding the URLs in the downloaded image is easy with exiftool:
$ wget http://cweiske.de/graphics/bdrem/html.png -O bdrem-html.png $ exiftool -source=http://cweiske.de/graphics/bdrem/html.png\ -relation=http://cweiske.de/bdrem.htm bdrem-html.png Warning: [minor] IPTC:Source exceeds length limit (truncated) 1 image files updated
Despite the warning, the full source URL is stored in the image file. But on JPG files the source is really truncated:
$ exiftool -S -source -relation bdrem-html.jpg Source: http://cweiske.de/graphics/bdrem Relation: http://cweiske.de/bdrem.htm
To work around this issue, we force exiftool to use the XMP source property instead of the IPTC source property:
$ exiftool -XMP:source=http://cweiske.de/graphics/bdrem/html.png\ -XMP:relation=http://cweiske.de/bdrem.htm bdrem-html.jpg 1 image files updated
Extracting the data is also possible:
$ exiftool bdrem-html.png ExifTool Version Number : 9.46 File Name : bdrem-html.png ... MIME Type : image/png Image Width : 463 Image Height : 122 ... Software : Shutter Source : http://cweiske.de/graphics/bdrem/html.png XMP Toolkit : Image::ExifTool 9.46 Relation : http://cweiske.de/bdrem.htm Image Size : 463x122 $ exiftool -S -source -relation bdrem-html.png Source: http://cweiske.de/graphics/bdrem/html.png Relation: http://cweiske.de/bdrem.htm
Browsers
Adding the meta data manually is possible, but it would be best if they were
added automatically when saving-by-right-clicking the image in my brower.
Unfortunately, no browser supports this.
MacOS
On MacOS, downloaded files have the download source in the "Where from" file information. Safari, Chrome and Firefox (bug, commit) support this.
It is stored as extended attribute com.apple.metadata:kMDItemWhereFroms in the file system, so it is not tied to the file itself (but also does not modify the file, and works for all types of files).
2024-01: Kelvin Thompson sent me an e-mail explaining that exiftool allows to access this attribute and copy it into a different tag in the file itself:
$ exiftool '-XMP:source<MDItemWhereFroms' filename.jpg
curl
The XDG defines a list of Common Extended Attributes, among them is user.xdg.origin.url. It shall be used as extended file system attribute, similar to what MacOS does.
curl supports writing this file system attribute:
$ curl --xattr --output html.png http://cweiske.de/graphics/bdrem/html.png $ getfattr --dump html.png # file: html.png user.mime_type="image/png" user.xdg.origin.url="http://cweiske.de/graphics/bdrem/html.png"
Chromium once supported it, but
removed it in 2019
because
metadata doesn't provide any security guarantees on Linux, and is a privacy risk
.
wget
wget, just like curl, supports the --xattr option:
$ wget --xattr http://cweiske.de/graphics/bdrem/html.png $ getfattr --dump html.png # file: html.png user.xdg.origin.url="http://cweiske.de/graphics/bdrem/html.png"
Firefox
Firefox has a feature request open since 2011 for supporting user.xdg.origin.url.
But it does write the origin URL to gnome gvfs meta data:
$ gio info --attributes=metadata:: html.png [...] attributes: metadata::download-uri: http://cweiske.de/graphics/bdrem/html.png
Update 2020-04: rdf:about
Daniel Aleksandersen suggested that we already have a property that means "URL of the thing we talk about", and that is rdf:about:
[...] goes on to identify the resource the statement is about (the subject of the statement) using the rdf:about attribute to specify the URIref of the subject resource.