Weak reference

In computer programming, a weak reference is a reference that does not protect the referenced object from collection by a garbage collector, unlike a strong reference. An object referenced only by weak references – meaning "every chain of references that reaches the object includes at least one weak reference as a link" – is considered weakly reachable, and can be treated as unreachable and so may be collected at any time. Some garbage-collected languages feature or support various levels of weak references, such as C#, Lua, Java, Lisp, OCaml, MATLAB[1], Perl, Python[2] and PHP since the version 7.4.[3]

Uses

Weak references have a number of common uses. When using reference counting garbage collection, weak references can break reference cycles, by using a weak reference for a link in the cycle. When one has an associative array (mapping, hash map) whose keys are (references to) objects, for example to hold auxiliary data about objects, using weak references for the keys avoids keeping the objects alive just because of their use as keys. When one has an object where other objects are registered, such as in the observer pattern (particularly in event handling), if a strong reference is kept, objects must be explicitly unregistered, otherwise a memory leak occurs (the lapsed listener problem), while a weak reference removes the need to unregister. When holding cached data that can be recreated if necessary, weak references allow the cache to be reclaimed, effectively producing discardable memory. This last case (a cache) is distinct from others, as it is preferable that the objects only be garbage collected if necessary, and there is thus a need for finer distinctions within weak references, here a stronger form of a weak reference. In many cases weak references do not need to be directly used, instead simply using a weak array or other container whose keys or values are weak references.

Garbage collection

Garbage collection is used to clean up unused objects and so reduce the potential for memory leaks and data corruption. There are two main types of garbage collection: tracing and reference counting. Reference counting schemes record the number of references to a given object and collect the object when the reference count becomes zero. Reference-counting cannot collect cyclic (or circular) references because only one object may be collected at a time. Groups of mutually referencing objects which are not directly referenced by other objects and are unreachable can thus become permanently resident; if an application continually generates such unreachable groups of unreachable objects this will have the effect of a memory leak. Weak references (references which are not counted in reference counting) may be used to solve the problem of circular references if the reference cycles are avoided by using weak references for some of the references within the group.

A very common case of such strong vs. weak reference distinctions is in tree structures, such as the Document Object Model (DOM), where parent-to-child references are strong, but child-to-parent references are weak. For example, Apple's Cocoa framework recommends this approach.[4] Indeed, even when the object graph is not a tree, a tree structure can often be imposed by the notion of object ownership, where ownership relationships are strong and form a tree, and non-ownership relationships are weak and not needed to form the tree – this approach is common in C++ (pre-C++11), using raw pointers as weak references. This approach, however, has the downside of not allowing the ability to detect when a parent branch has been removed and deleted. Since the C++11 standard, a solution was added by using shared_ptr and weak_ptr, inherited from the Boost library.

Weak references are also used to minimize the number of unnecessary objects in memory by allowing the program to indicate which objects are of minor importance by only weakly referencing them.[citation needed]

Variations

Some languages have multiple levels of weak reference strength. For example, Java has, in order of decreasing strength, soft, weak, and phantom references, defined in the package java.lang.ref.[5] Each reference type has an associated notion of reachability. The garbage collector (GC) uses an object's type of reachability to determine when to free the object. It is safe for the GC to free an object that is softly reachable, but the GC may decide not to do so if it believes the JVM can spare the memory (e.g. the JVM has much unused heap space). The GC will free a weakly reachable object as soon as the GC notices the object. Unlike the other reference types, a phantom reference cannot be followed. On the other hand, phantom references provide a mechanism to notify the program when an object has been freed (notification is implemented using ReferenceQueues).

In C#, weak references are distinguished by whether they track object resurrection or not. This distinction does not occur for strong references, as objects are not finalized if they have any strong references to them. By default, in C# weak reference do not track resurrection, meaning a weak reference is not updated if an object is resurrected; these are called short weak references, and weak references that track resurrection are called long weak references.[6]

Some non-garbage-collected languages, such as C++, provide weak/strong reference functionality as part of supporting garbage collection libraries. The Boost C++ library provides strong and weak references. It is a mistake to use regular C++ pointers as the weak counterparts of smart pointers because such usage removes the ability to detect when the strong reference count has gone to 0 and the object has been deleted. Worse yet, it does not allow for detection of whether another strong reference is already tracking a given plain pointer. This introduces the possibility of having two (or more) smart pointers tracking the same plain pointer (which causes corruption as soon as one of these smart pointers' reference count reaches 0 and the object gets deleted).

Examples

Weak references can be useful when keeping a list of the current variables being referenced in the application. This list must have weak links to the objects. Otherwise, once objects are added to the list, they will be referenced by it and will persist for the duration of the program.

Java

Java 1.2 in 1998 introduced[7] two kinds of weak references, one known as a "soft reference" (intended to be used for maintaining GC-managed in-memory caches, but which doesn't work very well in practice on some platforms with dynamic heap like Android[8]) and the other simply as a "weak reference". It also added a related experimental mechanism dubbed "phantom references" as an alternative to the dangerous and inefficient finalize() mechanism.[9]

If a weak reference is created, and then elsewhere in the code get() is used to get the actual object, the weak reference is not strong enough to prevent garbage collection, so it may be (if there are no strong references to the object) that get() suddenly starts returning null.[10]

import java.lang.ref.WeakReference;

public class ReferenceTest {
    public static void main(String[] args) throws InterruptedException {
        WeakReference r = new WeakReference("I'm here");
        StrongReference sr = new StrongReference("I'm here");
        System.out.println("Before gc: r=" + r.get() + ", static=" + sr.get());
        System.gc();
        Thread.sleep(100);

        // Only r.get() becomes null.
        System.out.println("After gc: r=" + r.get() + ", static=" + sr.get());
    }
}

Another use of weak references is in writing a cache. Using, for example, a weak hash map, one can store in the cache the various referred objects via a weak reference. When the garbage collector runs — when for example the application's memory usage gets sufficiently high — those cached objects which are no longer directly referenced by other objects are removed from the cache.

Smalltalk

|a s1 s2|

s1 := 'hello' copy.     "that's a strong reference"
s2 := 'world' copy.     "that's a strong reference"
a := WeakArray with:s1 with:s2.
a printOn: Transcript. 
ObjectMemory collectGarbage.
a printOn: Transcript.  "both elements still there"

s1 := nil.              "strong reference goes away" 
ObjectMemory collectGarbage.
a printOn: Transcript.  "first element gone"

s2 := nil.              "strong reference goes away" 
ObjectMemory collectGarbage.
a printOn: Transcript.  "second element gone"

Lua

weak_table = setmetatable({}, {__mode="v"})
weak_table.item = {}
print(weak_table.item)
collectgarbage()
print(weak_table.item)

Objective-C 2.0

In Objective-C 2.0, not only garbage collection, but also automatic reference counting will be affected by weak references. All variables and properties in the following example are weak.

@interface WeakRef : NSObject
{
    __weak NSString *str1;
    __unsafe_unretained NSString *str2;
}

@property (nonatomic, weak) NSString *str3;
@property (nonatomic, unsafe_unretained) NSString *str4;

@end

The difference between weak (__weak) and unsafe_unretained (__unsafe_unretained) is that when the object the variable pointed to is being deallocated, whether the value of the variable is going to be changed or not. weak ones will be updated to nil and the unsafe_unretained one will be left unchanged, as a dangling pointer. The weak references is added to Objective-C since Mac OS X 10.7 "Lion" and iOS 5, together with Xcode 4.1 (4.2 for iOS), and only when using ARC. Older versions of Mac OS X, iOS, and GNUstep support only unsafe_unretained references as weak ones.

class Node {
    public weak Node prev; // a weak reference is used to avoid circular references between nodes of a doubly-linked list
    public Node next;
}

Python

>>> import weakref
>>> import gc
>>> class Egg:
...     def spam(self):
...         print("I'm alive!")
...
>>> obj = Egg()
>>> weak_obj = weakref.ref(obj)
>>> weak_obj().spam()
I'm alive!
>>> obj = "Something else"
>>> gc.collect()
35
>>> weak_obj().spam()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'spam'

See also

References

C++

Java

PHP

Python