Adding the source URL to an image's meta data

Sometimes I download images from The Internet™ for later use. For reference I'd like to store some meta data inside the image itself:

The question is now: Which meta data field should I use for those URLs?

Finding the right property

EXIF

Basic meta data can be stored in the image's EXIF data, but there is no URL field:

$ exiftool -list -EXIF:All|grep -i url
$

The Exif 2.3 metadata for XMP document also does not list a single URL field, so exiftool was right.

XMP

The XMP standard is another way to store meta data in files. XMP Specification Part 1 defines multiple fields that one could use:

Possible XMP properties
Property Type Description
dc:relation Unordered array of Text A related resource.
dc:source Text A related resource from which the described resource is derived.

Unfortunately there are no "real" URL fields.

IPTC

There is another vocabulary, the IPTC Photo Metadata Standard.

Possible IPTC properties
Property Type Description
Iptc4xmpCore:Source Text The name of a person or party who has a role in the content supply chain.
Iptc4xmpCore:CreatorContactInfoCiUrlWork URL, multiple

The creator's contact information provides all necessary information to get in contact with the creator of this item and comprises a set of sub-properties for proper addressing.

The contact information web address part. Multiple addresses can be given, separated by a comma.

plus:ImageSupplier Seq ImageSupplierDetail Identifies the most recent supplier of the item, who is not necessarily its owner or creator. For identifying the supplier either a well known and/or registered company name or a URL of the company's web site may be used.

Only Iptc4xmpCore:CreatorContactInfoCiUrlWork sounds like it could be used to identify the web site that linked to the image, but I think it is meant to directly link to the creator's homepage - and not to a random URL that just contains a image tag.

Metadata Working Group

The Metadata Working Group published the Guidelines for Handling Image Metadata spec in 2010, and it contains a tag that actually matches my idea of "URL of website that linked to the image":

Possible MWG properties
Property Type Description
mwg-coll:CollectionURI URI URI describing the collection resource.

A "collection" in MWG speak is a group of images that this specific image is part of. And a website that links to the image can be seen as such a group.

Conclusion

I'm not satisfied with the available properties I found. But instead of inventing my own namespace with source and website properties, I'll simply use the Dublin Core XMP properties:

XMP properties I am using now
Property Usage
dc:source rdf:about URL of image that was downloaded
dc:relation URL of website that linked to the image

exiftool

Let's say that I visited http://cweiske.de/bdrem.htm and downloaded the image http://cweiske.de/graphics/bdrem/html.png. Now I want to add the website URL and image URL to its meta data.

Embedding the URLs in the downloaded image is easy with exiftool:

$ wget http://cweiske.de/graphics/bdrem/html.png -O bdrem-html.png
$ exiftool -source=http://cweiske.de/graphics/bdrem/html.png\
           -relation=http://cweiske.de/bdrem.htm bdrem-html.png
Warning: [minor] IPTC:Source exceeds length limit (truncated)
    1 image files updated

Despite the warning, the full source URL is stored in the image file. But on JPG files the source is really truncated:

$ exiftool -S -source -relation bdrem-html.jpg
Source: http://cweiske.de/graphics/bdrem
Relation: http://cweiske.de/bdrem.htm

To work around this issue, we force exiftool to use the XMP source property instead of the IPTC source property:

$ exiftool -XMP:source=http://cweiske.de/graphics/bdrem/html.png\
           -XMP:relation=http://cweiske.de/bdrem.htm bdrem-html.jpg
    1 image files updated

Extracting the data is also possible:

$ exiftool bdrem-html.png
ExifTool Version Number         : 9.46
File Name                       : bdrem-html.png
...
MIME Type                       : image/png
Image Width                     : 463
Image Height                    : 122
...
Software                        : Shutter
Source                          : http://cweiske.de/graphics/bdrem/html.png
XMP Toolkit                     : Image::ExifTool 9.46
Relation                        : http://cweiske.de/bdrem.htm
Image Size                      : 463x122
 
$ exiftool -S -source -relation bdrem-html.png
Source: http://cweiske.de/graphics/bdrem/html.png
Relation: http://cweiske.de/bdrem.htm

Browsers

Adding the meta data manually is possible, but it would be best if they were added automatically when saving-by-right-clicking the image in my brower. Unfortunately, no browser supports this.

MacOS

On MacOS, downloaded files have the download source in the "Where from" file information. Safari, Chrome and Firefox (bug, commit) support this.

It is stored as extended attribute com.apple.metadata:kMDItemWhereFroms in the file system, so it is not tied to the file itself (but also does not modify the file, and works for all types of files).

2024-01: Kelvin Thompson sent me an e-mail explaining that exiftool allows to access this attribute and copy it into a different tag in the file itself:

$ exiftool '-XMP:source<MDItemWhereFroms' filename.jpg

curl

The XDG defines a list of Common Extended Attributes, among them is user.xdg.origin.url. It shall be used as extended file system attribute, similar to what MacOS does.

curl supports writing this file system attribute:

$ curl --xattr --output html.png http://cweiske.de/graphics/bdrem/html.png
$ getfattr --dump html.png
# file: html.png
user.mime_type="image/png"
user.xdg.origin.url="http://cweiske.de/graphics/bdrem/html.png"

Chromium once supported it, but removed it in 2019 because metadata doesn't provide any security guarantees on Linux, and is a privacy risk.

wget

wget, just like curl, supports the --xattr option:

$ wget --xattr http://cweiske.de/graphics/bdrem/html.png
$ getfattr --dump html.png
# file: html.png
user.xdg.origin.url="http://cweiske.de/graphics/bdrem/html.png"

Firefox

Firefox has a feature request open since 2011 for supporting user.xdg.origin.url.

But it does write the origin URL to gnome gvfs meta data:

$ gio info --attributes=metadata:: html.png
[...]
attributes:
  metadata::download-uri: http://cweiske.de/graphics/bdrem/html.png

Update 2020-04: rdf:about

Daniel Aleksandersen suggested that we already have a property that means "URL of the thing we talk about", and that is rdf:about:

[...] goes on to identify the resource the statement is about (the subject of the statement) using the rdf:about attribute to specify the URIref of the subject resource.

Written by Christian Weiske.

Comments? Please send an e-mail.