ARgo: Augmented Reality + Verbal Description

Project Overview

Team ARgo’s goal is to create access to museum objects through an interpretive lens. This project is directed towards visitors with visual impairments and aims to enable independent exploration, participation, and empowerment through design. The team is prototyping a mobile application that uses an augmented reality engine to recognize objects in an exhibit using the phone camera. When an object is detected and identified, the app will then ask the user if they want additional information, such as an audio description.

Team Members: Dong Chan Kim, Leah Kim, Karly Rosen

March 2nd – 8th (Week 6): Team Brainstorm

Photograph of post it notes with ideas from a team brainstorm
Output of Team ARgo’s post up session

For the team’s first meeting, we held a post up session to brainstorm ideas to improve accessibility at the Cooper Hewitt, as well as to get to know each others strengths by mapping out our core competencies.

Key outputs from the brainstorm were:

  1. A suite of materials for visitors on the autism spectrum – An interactive sensory map and a social narrative that could be referred to before and during a visit so that visitors with autism know what to expect from the museum experience.
  2. An audio tour with verbal description accessed from the Cooper Hewitt API – During our initial contextual inquiries, providing accessibility solutions for persons with visual impairment seemed like a clear gap that needed to be addressed. Many institutions provide audio tours to fulfill this need, and our added goal was to connect to the Cooper Hewitt API to enable verbal description content to be incorporated into the Cooper Hewitt’s existing information architecture and used now or in the future.
  3. 3D Replicas of Pen Prototypes – Another sought after experience for visitors with visual impairments are touch tours or tactile graphics to better understand objects. This idea was to use the pen prototypes as the test objects with the main project output being a documented and repeatable process for the Cooper Hewitt to cost-effectively and regularly product 3D models for select objects in every exhibition.

After our post-up session, we decided to move forward with the sensory map and social narrative as our project focus.

March 9th-15th (Week 7): Team Brainstorm, Part 2

After deciding on the suite of materials for visitors on the autism spectrum, we received word that the Cooper Hewitt had already created a sensory map, social narrative, and quiet mornings at the museum for children on the autism spectrum (kudos!). It was great news to hear that the Cooper Hewitt was addressing the needs of this community since it is often overlooked by many institutions, but it meant the team had to go back to the drawing board.

We went back to our initial brainstorm list of ideas and Dong Chan suggested that we create a virtual reality app. Initially, this sounded impractical because it would limit the use of the solution to visitors with some level of sight. After some research, we found that augmented reality is growing field in assistive technology. While virtual reality involves the creation of a new virtual world, augmented reality layers data or experiences on top of the existing world. For example, a company called Float has built and app called Cydalion that offers audio feedback on obstacles using Google’s Tango platform, and a company called Aira equips blind and visually impaired persons with Google Glass apps so they can connect to sighted concierges who can orient them to their environment when needed.

This led to the idea of using a camera on a smartphone or tablet like a wand to sweep the area of a museum exhibit. Image recognition software trained with photos of exhibit objects would recognize an object in view and offer the user a verbal description of the object.

A line drawing showing a sample floor place and a user icon browsing objects in a room using a camera
An early sketch of augmented reality application using the camera to “find” museum objects

We explored a number of different image-recognition engines. We considered CloudSight, the image recognition platform used to the app TapTapSee, which inspired our brainstorm, and Aurasma, a platform by HP. The CloudSight platform required usage of their API which did not support real-time image recognition, and Aurasma was not open enough to allow for protoyping of different applications and interactions.

We settled on on the combination of Unity, a 3D modeling engine used primarily for gaming, and Vuforia, an image-recognition augmented reality platform, which were easy to access and modify, allowing us to start prototyping immediately.

March 23rd – 29th (Week 9): Sketching User Flows

A "back of the napkin" diagram showing how a user advances through screens in the app
User Flow Sketch #1
A whiteboard sketch with post it notes and dry erase marker showing a more detailed user flow
User Flow Sketch #2
A diagram using pink post it notes and blue dry erase marker on a whiteboard showing how a user advances through an app's functionality
User Flow Sketch #3







With our concept solidified, we moved on to the task of creating the initial user flows for the app. We progressed through several iterations of the flow and ended with a flow that incorporates the following basic features:

  1. Add to collection – The ability to save items to “My Visit,” similar to the pen by entering the code on the museum ticket
  2. Object detection – The ability to identity objects in an exhibit through image detection
  3. Verbal description – The ability to play, pause, and skip verbal description of an object
Interaction flow chart for ARgo. Open App > Vibration & Sound > Detect Object > Vibration & Sound > Names object detected > Ask for audio description > If YES > Play Audio > continue If Skip is chosen > Return to Detect Object If NO > return to Detect Object > Prompted to save item to collection > If YES > Save object > vibrates to confirm > returns to Detect Object. If No > Vibrates to confirm > returns to Detect Object
Initial interaction flow chart for ARgo

March 30th – April 5th (Week 10): Contextual Inquiry at the Cooper Hewitt

Three people are standing in front of hanging fabric art panels. A woman with a white cane and sunglasses stands with her companion on the left side and the curator stands on the right side holding a bag of fabric samples
Dr. Leona Godin listening to a Cooper Hewitt curator describe an object in the ‘Scraps’ exhibit

The team visited the Cooper Hewitt again, this time in the company of Dr. Leona Godin, an accomplished writer, artist, and performer in her own right, to see the museum through her eyes. Leona is legally blind, navigates with the assistance of a white cane, and her partner, Alabaster Rhumb.

Tactile Elements Enhance Verbal Description

Instead of the docent-led tour, two curators were there to lead us through two exhibits, Scraps and The World of Radio. The first (Susan) had confessed they never did a descriptive tour before. But the textiles exhibit was very good and the tactile objects helped and she was rather quick on her feet when it came to the descriptions. We are curious as to see how a docent tour would be like in comparison.

A hand holds a white egg-shaped silk coccoon pod the size of a medium marshmallow
A silk cocoon pod
A woman wearing a black leather jacket, sunglasses and a white cane smiles as she feels an object with her fingers
Leona touching the silk cocoon
The photo is looking over the shoulder of a woman wearing a black leather jacket and holding a white cane feeling a thin piece of fabric with a floral print
Leona touching fabric scraps
Leona tugs on a clear, thin band of polyurethane
Leona touching a polyurethane sample from one of the artists in the Scraps exhibit








Challenges with a Verbal Description Only Approach

Three people stand in front of four framed illustrations of radios mounted on a peacock blue wall
Dr Leona Godin at the World of Radio exhibit at the Cooper Hewitt

The second tour in The World of Radio revealed a few key challenges that we noted for design of all exhibits:

  1. Relying on verbal description only is taxing for users – The challenge of verbal description-only approach to creating accessibility for persons who are blind or have visual impairments became clear during this tour. Leona and other users that we interviewed stated that having to process information only through verbal description is mentally tiring. Pairing verbal description with a tactile element allows for the transfer of more information with less effort for the user. We started with a very large batik tapestry with an incredible amount of detail, moved on to a grouping of illustration on the wall, and then finished with a case of radios. While all were described in great detail, after all the objects in the textiles tour and attentive listening for over 30 minutes, even sighted users in the tour were needing a break to process all the information.
  2. Effective and evocative verbal description is difficult – The curators are subject matter experts on the objects in the exhibits, and even they struggled to convey all the information needed using verbal description only. Some basic information such as an item name and dates were left out, and creating an effective mental picture of an object for a user is a unique and challenging task.

Items Outside of the Project Scope the Cooper Hewitt Might Address

  • Leona interacting with the pen– it is currently not very accessible for people with vision impairments!

    Video displays throughout the museum lacked headphones and no alternative way of receiving audio-described media

  • There were several barriers for Leona in using the Pen:
    • She did not realize there was a unique code on her museum ticket and no way for her to reliably remember her code after the visit
    • Finding the hash marks to save an object to her collection required the help of her partner
    • Without her partner, Leona would have had to keep her dominant hand free for her white cane


During our post-tour debrief with Leona she confirmed interest in our app concept, and suggested possibly dividing information into a hierarchy, and/or possibly providing several tour options, such as a historical tour focused on the architecture of the Carnegie Mansion, a tours on the rotating exhibits, etc.

We had a discussion with Pamela Horn and the Cooper Hewitt digital team after the tours about the details of implementing some of our ideas and connecting with the Cooper Hewitt API. The Cooper Hewitt has several different ways of describing and providing interpretive information about objects. The most basic is an object description, which we confirmed we can access through the API. There are also descriptions called “chats,” which are not currently stored in the API and therefore will likely not be a scalable solution until the point they are regularly stored in the collections database. A fun fact from Pamela is the largest search from the collection is color—but no data on the details.

Additionally, Pam let us know they are launching their first large print label project for the upcoming Jazz Age exhibit. The large print labels will be provided as physical books that visitors who need larger/high-contrast text can use to read labels. The Jazz Age exhibit has hundreds of items and Pam explained the challenge of organizing the large number of labels so that visitors can easily find the label they are interested in as they explore the exhibit. Hopefully, the ARgo app can solve that challenge by providing the same information as the large print labels, but on-demand via image recognition.

April 6th – 12th (Week 11): Surveying Our Users

On April 5th we deployed a 10-question online survey to gather qualitative and quantitative feedback from persons who are blind and visually impaired on their experiences in museums, comfort in using a smartphone device, and the types of information they would prefer to receive about objects in an exhibit.

  1. What is the most common way you access information in museums?
  2. What would enhance your experience as a museum visitor?
  3. Are you familiar with or have had experience with augmented reality?
  4. How comfortable are you with using your phone/apps for day-to-day activities?
  5. When using a device in a museum, would you prefer to use a secondary device instead of your phone?
  6. Would a large download size make you hesitant in using an application?
  7. Would you be willing to commit a large amount of memory on your mobile device if the application was an effective one?
  8. Which mobile application navigation method suits you best?
  9. Would you be interested in a mobile application that describes the room around it, historical information about the building, or objects on display in an exhibition?
  10. When you are on a tour in a museum, which information is most and least relevant?
    1-Description of the exhibition space/building
    2-Object facts, such as name, year it was made, and artist name.
    3-Object history, such as the process of designing and making the object
    4-Visual description of the object
    5-Similar objects on display

Claire Kearney-Volpe, director of the ABILITY Project, was kind enough to connect us with colleagues in the accessibility community who could distribute the survey to our user base, and we also requested feedback from the blind community on the r/blind subreddit on and the AppleVis App Development Forum. In total, we received 11 survey replies. Key findings from the survey follow.

Museum Experiences

Most of our participants need to be facilitated with a docent, friend, or other sighted person in order to experience museums. The majority of participants would prefer a tactile experience; however the second most desired is an audio-based tour and/or smart phone app.

Augmented Reality

Most of our participants have either never heard of augmented reality, or have not experienced it (understandable given it is more visually based.)

Smartphone Ease of Use

Most of our survey participants indicated they would be comfortable using a mobile app and downloading an app with a large size if it was effective. However, we do recognize that our results may be skewed to a tech savvy audience given that the survey was administered online and deployed through two websites geared towards heavy app or internet users. A broader sample size inclusive of in-person survey results would be more inclusive of the views and preferences of persons who are blind or have visual impairment. Most survey participants indicated they would prefer voice control for an application.

Information Hierarchy

Most of our participants primarily want a verbal description and the information of about the museum architecture itself the least, supporting a possible split of “tour modes.”

Full Survey Replies

April 13th – 19th (Week 12): User Testing with the Accessibility Community

[pictures from user test TK]

We were fortunate to have the opportunity to test our early prototype on experts in the accessibility community and receive valuable feedback from users and people who have devoted many hours thinking about the challenges of designing solutions for people with disabilities.

We asked users to perform a simple task: see if they could locate two target objects mounted on the wall by pointing the camera on a mobile phone. This is the basic action the fully-finished app will request of users. Feedback was detailed, enthusiastic, and varied based on each testers preferences, as was to be expected. Our takeaways from the user test are below.

Requirement for a constant feedback mechanism

The largest area of friction for our testers with visual impairments was understanding if they were on the “right track.” Both of our testers had some light and movement perception, but understanding whether they were pointing the device in a direction that would yield a result was a main challenge. They suggested a constant feedback mechanism such as a sound or vibration used as a directional indicator to let the user know either there was nothing there or the user is moving closer to an object.

A similar challenge to the pen: distracting from navigation

A mobile device poses the same challenge as holding the Pen — it requires a white cane user to use their hands for using the tool, which distracts from the task of navigating through a space. Additionally, one user indicated that holding up the phone is an uncomfortable position. Devices should be provisioned with neck loops to allow users to carry the devices hands free. Additionally, headphones will need to be provided so that users can listen in private.

Focus on Verbal Description

We received a lot of feedback on the type of audio to provide. It was strongly emphasized that we should provide a description of an object versus interpretation of the object. This includes prioritizing physical description (size, height, color, etc) after an object’s name. If an object has text on it, it should describe the text for the user.

Some testers requested that the app provide a sense of the orientation of objections in relation to each other. For example, if there are multiple objects in a display case, a good option would be to receive description on a group of the objects (i.e. “in the upper-left quadrant are object a, b, and c”).

Additionally, we heard again, as with the feedback from Dr. Leona Godin, that concentration on just one sense, such as audio feedback, is mentally taxing. As a result of this feedback, we are including the ability to adjust the “verbosity” of the verbal description, allowing users to receive just basic title and designer information, a physical description of the object, or finally the historical context and positioning of the object in the room.

Watch Outs

  • Art comes in different sizes, so a user may point the camera at only a part of an object
  • App updates need to be simple enough to be updated by museum staff for each new exhibit

Ideas to explore

  • Navigational assistance – Create the ability to guide a user to a specific piece of art they are interested in or the ability to detect objects from farther away so that someone with a visual impairment can stand in the center of a room and get information about objects without having to walk from piece to piece
  • Extending the concept to the gift shop – Implement with the Cooper Hewitt gift shop to enable persons with visual impairment to shop for items for the full experience.

Initial user feedback notes

April 20th – 26th (Week 13): Diving in to the Cooper Hewitt API and Creating UI Mockups

Black marker sketches of a smartphone screen with buttons and flow diagrams
Sketches for the first ARgo screen mockups

We are continuing work on the development of the basic functionality of image recognition to triggering a description of the object. Dong Chan and Karly reached a major milestone in our prototype development this week. They successfully made a call to the Cooper Hewitt API for object information from the Unity platform. This was an area of concern for the team since, at the outset, we weren’t sure if this would be possible.

Additionally, Karly sketched and created mockups of the UI to help demonstrate what the final product might look like for users. Note: In the mockup below the “Adjust Size” button will be changed to “Verbosity” to allow users to change the level of information they hear.

Splash page
Tour options page
Image detection
Image pulled from Cooper Hewitt API of identified object
Scrolling action of large print text








Breakdown of the Alt Text