Developing a Shape-and-Composition CBIR Thesaurus for the Traditional Chinese Landscape1
This paper has been awarded the 2007 Gerd Muehsam Award, sponsored by the Art Libraries Society of North America (ARLIS/NA).
In the past decade, content-based image retrieval (CBIR) has been investigated extensively. Current research has suggested that the two elemental issues in CBIR, feature extraction and similarity measures, tend to be domain-specific. This paper develops a shape-and-composition CBIR thesaurus for Chinese landscape paintings dating from the Song to Qing periods (960-1911). The features were extracted from studying approximately 1,000 Chinese landscape paintings. The thesaurus emphasizes discrimination among object types in order to improve retrieval of relevant images. Therefore, it adopts not only basic shapes but also line and shape combinations. Furthermore, special shapes are developed for those object types that are either unique to Chinese art and culture, or are a peculiar shape that cannot easily be abstracted into basic shapes. Although it is domain-specific, the approach of developing and classifying the thesaurus may be applicable to CBIR of non-Chinese art images and CBIR in general.
This paper develops a shape-and-composition content-based image retrieval (CBIR) thesaurus for Chinese landscape paintings dating from the Song to Qing dynasties (AD 960-1911). The thesaurus emphasizes discrimination among object types in order to improve retrieval of relevant images. It adopts not only basic shapes but also line and shape combinations. Special shapes represent object types that are either unique to Chinese art and culture, or are a peculiar shape that cannot easily be abstracted into basic shapes. Although it is domain-specific, the approach of developing and classifying the thesaurus may be applicable to CBIR of non-Chinese art images and CBIR in general.
Two major approaches have been identified in image indexing and retrieval: text-based (descriptor-based) and content-based. Text-based image retrieval (TBIR) has been widely adopted in the cataloging and indexing of image collections in libraries, museums, and archives, relying on manually assigned text descriptors to retrieve relevant images. More recently, automatic assignment of text attributes to images has been developed by extracting terms from captions and descriptions. Because it is based on traditional text Information Retrieval (IR) techniques, the TBIR system seems relatively easy to develop and use. However, as both Yang (2004) and Graham (2004) point out, manual assignment of text annotations to images is time-consuming and expensive. Further, it may be impossible to automate text assignment if no description accompanies an image. Moreover, although text-based image descriptors may seem comprehensive and objective, they are inevitably a partial and perhaps inadequate representation of visual information and are therefore subject to individual interpretation. Image catalogers and indexers focus on “important” aspects (such as major objects, subject, and relationships) of content (“aboutness”) contained in an image. It is not surprising to find that different people have different interpretations of the “aboutness” of an image; the interpretation of images by catalogers and indexers are not always consistent.
Developed in the early 1990s, Content-Based Image Retrieval (CBIR) uses automatic extraction of lower-level image features (such as texture, color, shape, and structure) to catalog and retrieve images. CBIR classifies and searches images according to similarities in automatically extracted visual features, and output is usually a ranked list of images in order of their similarity to the query
A number of CBIR systems have been developed over the past ten years, including IBM’s Query by Image Content (QBIC) search engine. The State Hermitage Museum in St. Petersburg, Russia, was an early adopter of QBIC for its digitized image collections. The QBIC system allows users to query image databases based on color percentages, color layout, textures, and shapes occurring in the images. In addition, users can use examples (pictures or sketches drawn by the users) to formulate image queries. Content-based queries can also be supplemented with text and keyword queries.
Although there is great interest in research of this type, it is difficult to evaluate a CBIR system because of “lack of agreement on appropriate effectiveness measures and the inherent difficulty in establishing criteria for image similarity” (Graham, 2004, p. 329). As Jörgensen (1999) has asserted, many CBIR techniques are “computationally possible” (p. 302). However, little research has been done to examine how the search and retrieval performance of these CBIR systems correspond with the visual information needs of actual users (Chen, 2001, p. 260). Accordingly, only a few museums, libraries, and archives have adopted CBIR.2 Nevertheless, compared to labor-intensive text-based information retrieval (TBIR), CBIR techniques are inherently faster, less expensive, and much more objective (Graham, 2004, p. 330). Further, a recent study of end users’ image query behaviors by Chen (2001) has suggested the practicability of CBIR-based tools in the field of art history. In particular, Zhang, Pham, and Li (2004) maintain that the CBIR approach is potentially useful to investigate relationships among paintings based on objective visual facts, rather than subjective interpretations (p. 258).
Previous research by Zhang et al. (2004) in applying CBIR to Chinese paintings has suggested that the two elemental issues in CBIR—feature extraction and similarity measures tend to be domain-specific because image content is identified not only by data but also by context (i.e., domain-specific knowledge). This research demonstrated that a case study of a controlled and well-defined domain of images is useful to validate and further enhance existing CBIR techniques. Therefore, this paper will focus on the application of CBIR to traditional Chinese paintings—specifically, Chinese landscape paintings.
With a long and glorious history, Chinese painting has developed its unique form and style by the manipulation of brushes to apply ink and colors to rice paper or silk. In terms of subject matter, Chinese painting can be categorized into three sets: landscape, bird-and-flower, and figure paintings. Of these, landscape painting is regarded as most important because it was a central topic for the literati painters who produced most of the extant Chinese paintings.
As stated by Zhang et al. (2004), CBIR is potentially an excellent and feasible retrieval mechanism for Chinese landscape painting in terms of its visual features. These artworks attempt to capture the essence rather than the real shape of nature to express the painters’ ideas and feelings. Painters of Chinese landscapes use relatively simple forms and textures, only a few colors, and a small number of brush strokes. A limited number of object types are depicted in Chinese landscape paintings, most commonly mountains, rocks, water, clouds, woods, and trees, and sometimes dwellings, pavilions, bridges, figures, and animals. For each object type, there are usually a small number of variations. Furthermore, composition of these paintings follows certain perspectives and models.
The shape-and-composition CBIR thesaurus below is intended specifically for use with Chinese landscape paintings dating from the Song to Qing periods (AD 960-1911). Approximately 1,000 paintings were studied to extract the features to be included in the thesaurus. Because there are not many extant Song and Yuan paintings, about 80 percent of these painting samples were dated to the Ming and Qing periods (AD 1368-1911).3 Color and texture, as features, are not yet included in the thesaurus. The shape-and-composition CBIR thesaurus emphasizes discrimination among object types to improve retrieval of relevant images. It adopts not only basic shapes (such as circles, rectangles, and triangles) but also combinations of lines (straight, arc, and wavy) and shapes. Special shapes are developed for those object types that are either unique to Chinese art and culture (such as linglong-shaped rocks, bamboos, and dragon boats), or are a peculiar shape that cannot easily be abstracted into basic shapes (such as birds and clouds).
The thesaurus is designed to facilitate the extraction and indexing of image content data for effective retrieval performance. Although it is domain specific, the approach of developing and classifying the thesaurus may be applicable to CBIR of non-Chinese art images and CBIR in general.
This thesaurus is a controlled CBIR list of conceptual forms and compositions extracted from Chinese landscape art ranging from the Song to Qing dynasties (AD 960-1911). It consists of abstract shapes, lines, and composition templates. The Appendix contains a concise version of the thesaurus without explanatory notes.
Shapes can be divided into two main types: basic shapes, and special shapes. Shapes may be used individually or in combinations when necessary.
Lines include straight lines, arc lines, and wavy lines. Lines are used in groups or in combination with shapes.
Composition templates characterize visual layout structures that were commonly adopted in Chinese landscape painting techniques, and are generally used in combination with shapes and/or lines. Accordingly, the thesaurus is cocomposed of two parts: Part I, which focuses on shapes and lines; and Part II, which provides composition templates.
Part I (shapes and lines), the core of the thesaurus, is arranged alphabetically and hierarchically by object types and their variations that frequently appear in Chinese landscape painting. Object types are divided into two main sets: (A) primary elements, and (B) secondary elements. Primary elements refer to object types that can always be found in Chinese landscape painting, and consist of five categories: clouds, mountains, plants, rocks, and water. Secondary elements refer to object types that occasionally appear in Chinese landscape painting, and comprise four categories: animals, architecture, persons, and transportation facilities. Each category is further divided hierarchically into subcategories.
Part II (composition templates) includes 14 templates, which are classified into two main categories (fully filled and one part). They are also organized in alphabetic and hierarchical order.
Each entry includes: a number indicating its order in the thesaurus; the name of the category/subcategory; a scope note (SN) that explains or defines the category; occasional supplemental notes (Note) attached to subcategories when they are not self-explanatory; and abstractions (shapes/shape combinations/line groups/line and shape combinations/composition templates).4
Part I: Shapes and Lines
A. Primary Elements [SN: This set refers to object types that are always found in Chinese landscape painting. It consists of five categories: clouds (A1), mountains (A2), plants (A3), rocks (A4), and water (A5).]
Whether depicted in a close view or in the far distance, a mountain in a Chinese landscape painting is generally composed of peaks and crags. Accordingly, each subcategory of the mountains category is divided into two main types: crags (A2.1.1/A2.2.1) and peaks (A2.1.2/A2.2.2). These two main types are further classified hierarchically into their varieties on the basis of their spatial relationships.
When abstracting their undulating skylines into straight lines, peaks and crags are interpreted as two basic shapes: right trapezoids standing for crags, and triangles for peaks. Both shapes can be used in groups and combinations to represent multiple adjacent (A220.127.116.11/A18.104.22.168) or isolated (A22.214.171.124/A126.96.36.199) crags and peaks.
Because reeds and grass in Chinese landscape paintings are usually depicted with piles of simple brushstrokes, a group of five straight vertical lines is adopted to symbolize them, with longer lines for reeds and shorter lines for grass to illustrate their difference.
In addition, trees are further divided into two main types: popular (A3.3.1), and uncommon (A3.3.2). Popular species of trees include bamboos (A188.8.131.52), pine trees (A184.108.40.206), and willows (A220.127.116.11). Since popular species of trees are differentiated primarily by their leaves in Chinese landscape painting, abstract shapes of leaves are used to represent them. A generalized tree icon is employed to represent the less-frequently used species of trees in Chinese landscape painting.
According to their various contours, the rocks category is divided into four subcategories: ellipse (A4.1), irregular polygon (A4.2), linglong6 (pierced and rounded irregular polygon) (A4.3), and rectangle (A4.4).
[Note: The shape serves as a generalized icon because irregular polygon-shaped rocks in the Chinese landscape painting have many varieties and it is unnecessary to categorize each of them.]
[Note: The shape serves as a generalized icon because linglong-shaped rocks in the Chinese landscape painting were usually rendered out of the artist’s imagination and thus have no standard forms. In addition, the one hole here symbolizes one or multiple holes that may be found on linglong-shaped rocks. The number of holes on such rocks is very arbitrary, so it is unnecessary to specify it.]
According to major line types for water delineation, the water category is divided into two subcategories: non-waterfalls (A5.1), including rivers, springs, lakes, seas, and other forms of water which are usually portrayed in wavy lines; and waterfalls (A5.2), which are often depicted in parallel steep arc lines. Hence, a group of horizontally wavy lines represents all kinds of non-waterfalls (A5.1), while a group of three parallel steep arc lines stands for waterfalls (A5.2).
[Note: Representations of specific non-waterfall types (such as rivers, springs, lakes, and seas) are not differentiated in this subcategory because their brushstrokes are essentially the same in nature. Furthermore, visualization of a certain non-waterfall type is sometimes very subjective in Chinese landscape painting. Divergent illustrations can refer to a same non-waterfall type. Therefore, use of a generalized icon is the best way to represent non-waterfall elements for the purposes of this paper.]
B. Secondary Elements [SN: This set refers to object types that sometimes appear in a Chinese landscape painting. It contains four categories: animals (B1), architecture (B2), persons (B3), and transportation facilities (B4).]
The birds subcategory consists of three types specified by their various motions, specifically flying (B1.1.1), sitting on the ground or in the water (B1.1.2), and standing (B1.1.3). A V shape is used to represent flying (B1.1.1) and an S shape is used to represent sitting (B1.1.2). While a combination of an S shape, representing the body and two vertical lines for feet are used to symbolize standing (B1.1.3).
The mammals subcategory is further divided into two types according to the popularity of their appearance in the Chinese landscape painting. The first type, popular (B1.2.1), includes three varieties of mammals most commonly seen in Chinese landscape painting, namely deer (B18.104.22.168), donkey/horse (B22.214.171.124), and waterbuffalo (B126.96.36.199). These mammals are essentially differentiated by description of the head in Chinese landscape painting; therefore, abstract representation of the head is used to symbolize them. The second type, uncommon (B1.2.2), refers to mammals other than those in the popular type (B1.2.1). Since they are seldom found in Chinese landscape paintings, it is very difficult to specify and categorize them. Therefore, they are represented by a generalized shape combination which includes a triangle (head) and a rectangle (body).
[Note: Since the three types of popular mammals are mainly differentiated by representations of the head (including ears and horns), abstract shapes of their head and horns or ears are used to symbolize them.]
[Note: Two oblique lines are used to stand for the horns and one triangle for the head.]
[Note: Two oblique lines are used to stand for the ears and one ellipse for the head.]
[Note: Two curves are used to stand for the horns and one triangle for the head.]
The bridges subcategory (B2.1) is classified into two types: arch bridges (B2.1.1), and beam bridges (B2.1.2), according to the shape of the bridge deck. A bridge is abstracted into a combination of a rainbow shape (arch), a rectangle (beam) standing for the bridge deck, and two vertical rectangles for all bridge piers.
The buildings subcategory (B2.2), including dwellings, pavilions, and pagodas, is classified into two types according to the number of stories: multi-storied (B2.2.1), and single-storied (B2.2.2). A trapezoid is used for the roof and a rectangle for the building frame.
The first subcategory includes bending (B3.1.1) and walking (B3.1.2). The second subcategory is divided into three types: lying down (B3.2.1),sitting (B3.2.2), and standing (B3.2.3). As both subcategories walking and standing are abstracted into a same-shape combination, the former is represented with a dashed line and the latter with a solid line to demonstrate the difference. The sitting category is further divided according to the face orientation. A single person is interpreted as composition of two basic shapes, namely a circle standing for the head, and a rectangle or triangle for the body. The rectangle is used when the body is stretching (vertically or horizontally). Otherwise, different types of triangle are used to represent the body.
The boats subcategory (B4.1) includes five types according to their various forms: canoes, dragon boats, fishing boats, passenger ships, and sailing boats. The canoe (B4.1.1) is interpreted as one semi-ellipse that symbolizes the body. The dragon boat (B4.1.2) is unique to China and thus interpreted as a special U shape. The fishing boat (B4.1.3), usually with an oval canopy, is represented by a combination of a smaller semi-ellipse (‘canopy’) and a bigger inverted semi-ellipse (‘body’). The passenger ship (B4.1.4), commonly with a rectangular canopy, is interpreted as one semi-ellipse standing for the body and one rectangle for the canopy. The sailing boat (B4.1.5) is divided into two varieties according to the form of its sail: trapezoid (B188.8.131.52), or triangle (B184.108.40.206). Therefore, a sailing boat is abstracted into a combination of a trapezoid/triangle standing for the sail with a semi-ellipse circle for the body.
A carriage (B4.2) is symbolized by a combination of a circle standing for the wheels and a rectangle for the body.
Part II: Composition Templates
This part consists of 1 4 composition templates that are commonly applied to a Chinese landscape painting. Based on the overall layout of objects in a given Chinese landscape painting, t he composition templates are classified into two main categories: fully filled (C1), and one part (C2). Abstract rectangles are used to symbolize objects in the painting and illustrate their spatial relationships.
A Chinese landscape painting in the fully filled (C1) framework represents objects that cover most or all of the rice paper or silk. In terms of spatial relationships between objects, this category is further divided into two subcategories: non-symmetrical (C1.1), and symmetrical (C1.2). The non-symmetrical subcategory includes four types: extended (C1.1.1), fragmented (C1.1.2), vertically superimposed (C1.1.3), and zigzagged (C1.1.4). The extended type is further classified into two varieties: horizontally (C220.127.116.11), and vertically (C18.104.22.168). According to the orientation of balanced parts, the symmetrical subcategory consists of three types: bilateral (C1.2.1), diagonal (C1.2.2), and up and down (C1.2.3).
In contrast to the fully filled composition template, Chinese landscape paintings in the one part (C2) framework represent objects that lie within a specific part of the rice paper or silk. To further identify the location of these objects within the painting, this category contains six subcategories: center (C2.1), left (C2.2), lower (C2.3), lower left (C2.4), lower right (C2.5), and right (C2.6).
[Note: Objects are spread all over the painting without evident spatial relationships between one another.]
[Note: Objects are usually lofty and vertically superimposed.]
[Note: Objects are arranged in a zigzag route.]
To test the ability of the thesaurus to adequately represent the content of the paintings, it was applied to four randomly selected Chinese landscape paintings, which are dated to different time periods and had not been used previously in developing the thesaurus. The features and composition templates allowed adequate representation of the content of the paintings. One example is reported below.7 No attempt was made to test retrieval, which would require access to an entire collection indexed with the thesaurus. Nevertheless, users may run a query by selecting abstract shapes and/or compositions if they wish to find landscapes which bear certain shapes and/or compositions.
In Figure 1, the Song landscape consists of (A) primary elements (mainly peaks and crags in both the distance and in close view, pine trees, and a river), and (B) secondary elements (specifically sailing boats). The counterparts in the thesaurus are:
The composition of this Song landscape corresponds to C22.214.171.124 (fully-filled, non-symmetrical, and horizontally extended).
This paper develops a shape-and-composition CBIR thesaurus for traditional Chinese landscape painting dating from the Song to Qing periods (960-1911). The thesaurus is based on visual features of Chinese landscape paintings, including less complexity of forms and textures, a few colors, and a certain number of object types, varieties, and composition structures. Not only basic shapes (such as circle, triangle, and rectangle) but also lines and shape combinations are adopted in the thesaurus. Furthermore, special shapes are developed to represent object types with unique forms. By emphasizing discrimination among object types, this thesaurus aims to improve recall of relevant images. Results from testing the thesaurus demonstrate that shape features and composition templates are sufficient to represent the content of the paintings. Therefore, this shape-and-composition CBIR thesaurus has the potential to be a feasible and effective means to index and retrieve Chinese landscape paintings. It may be applied to those museums and libraries that have large collections of these works, such as the Freer Gallery of Art and the Metropolitan Museum of Art. As mentioned previously, Chinese landscape paintings were created primarily to convey the painters’ ideas and feelings. Textual interpretations of the hidden meanings in the paintings are inherently subjective and partial. The perceived objectivity of CBIR may be useful to discover new knowledge that has never been studied or was actually misinterpreted.
Further research includes processing additional paintings to test and ensure the completeness of the thesaurus with regard to shapes and composition abstractions in Chinese landscape paintings. Although numerous paintings were studied to develop the features in the thesaurus and the most common have been included, an even larger number of sample paintings may need to be processed. In addition, it may be necessary to develop some computing algorithm and techniques to automatically process the feature extraction. Furthermore, to make the thesaurus more solid and effective for a CBIR system, the thesaurus needs to incorporate color and texture as well. Finally, the thesaurus should be tested to determine its usefulness for adequately representing users’ information needs and its retrieval effectiveness in differentiating among various elements of Chinese landscape paintings.
I would like to express many thanks to Professor Marilyn Domas White and Ms. Joan Stahl for their helpful advice and generous support. I would also like to thank Yu-tzu Chang, a former classmate who did the initial research and report with me.
Graham, M. E. (2004). Enhancing visual resources for searching and retrieval—Is content-based image retrieval a solution? Literary & Linguistic Computing: Journal of the Association for Literary and Linguistic Computing, 19(3), 321-333.
Zhang, D., Pham, B., & Li, Y. (2004). Modeling traditional Chinese paintings for content-based image classification and retrieval. Proceedings of the 10th International Multimedia Modeling Conference, 258-64.
1 This paper is a revised and shortened version of a term paper written in spring 2006. The original paper (approximately 48 pages), complete with illustrations from Chinese landscapes, is posted at: http://shinylee.googlepages.com/CBIRThesaurusforChineseLandscape_Tan.pdf.
2Aside from the Heritage Museum, other image collections using CBIR system include the Leiden 19th-Century Portrait Database from Leiden University, The Netherlands, and The SCULPTEUR (Semantic and content-based multimedia exploitation for European benefit) Project from The Victoria & Albert Museum, U.K.
4Chinese landscape examples with superimposed abstractions are not provided here because of limited space but are available in the original, lengthier version of the paper posted at: http://shinylee.googlepages.com/CBIRThesaurusforChineseLandscape_Tan.pdf
5It should be noted that the meticulousness of brushstrokes to delineate a mountain is a key measurement to classify whether a mountain is a ‘close view’ or ‘distant view.’ This is because Chinese landscapes were painted in a unique many-point instead of one-point perspective. For example, behind a mountain range, another range may loom with trees, houses, and streams in full view. As mentioned previously, Chinese landscapes advocate to capture the essence rather than the reality of the nature.
7Other examples are available in the original paper posted at http://shinylee.googlepages.com/CBIRThesaurusforChineseLandscape_Tan.pdf
Part I: Shapes and Lines
A. Primary Elements
B. Secondary Elements
Part II: Composition Templates
Tang Li recently received an MLS from the College of Information Studies, University of Maryland (UM), College Park. She is also a graduate assistant of UM art and architecture libraries. Her research interests are in art librarianship, reference, image searching and retrieval, and collection management.
Copyright, 2013 Library Student Journal | Contact