New innovation projects on VR-based therapies

We had an excellent new year start with two new innovation projects on VR-based therapies.

Swimming with AI dolphins

“Swimming with Dolphin” was previously developed by the UON Games Team (Iain Douglas and Rob Lambert)

The first Knowledge Exchange project will deliver an innovative VR therapy “Swimming with AI Dolphins” which offers a unique interactive underwater experience to ease the symptoms of mental illness with the Northampton-based company VR Therapies which has provided private therapy sessions for over 1000 people.

The figure above shows the outcome of the original “Swimming with Dolphin” project. Users can book a therapy session to be submerged in water virtually in a 360-degree view as the dolphin slowly and peacefully encircles the viewer. The current application is based on a conventional 360 video delivery. The content of the video is the same for all users and it will not respond to user activities.

With our expertise in game art and machine learning, the application will be transformed with interactive features and an AI-assisted dolphin character. We will use hand/eye tracking sensors and integrated microphones of new-generation headsets to capture and model user movements. Machine learning will be employed to develop an “AI dolphin” that can respond to viewer activities (such as voice and hand gestures) with natural movements.

The unique offering of the “Swimming with AI Dolphins” experience will help the company stand out from its competitors. Besides improved sales of onsite private therapy sessions, the solution can become a “killer app” of the company’s new Headsets@Home service, which allows people with mental illness to rent headsets with pre-loaded therapy content for self-administered therapy at home.

Feasibility study of an innovative VR-based psychotherapy

This Innovation and Commercialisation project will conduct a feasibility study of an innovative VR psychotherapy as a pathway to commercialisation. The VR application automates comprehensive psychotherapy that is widely used for life-changing treatment of anxiety disorder and depression. This revolutionary design can help improve public access to the therapy amid ongoing challenges in NHS. We will conduct a small-scale user trial, assisted by partnering health institutions (St Andrew’s Healthcare and Cardinal Clinic).

The project will support the following activities.

  • Product development. Transform the prototype into a clinically ready product that can be operated by patients. This step requires significant input from the patient and the public. Therefore, we will invite public volunteers of different age, gender, ethnic, and socioeconomic backgrounds to support the product development. 
  • User trial preparation. This activity will focus on preparing all necessary documents, protocols and procedures for the trial. Involvement from the public is also critical for this activity. We will seek public participation in developing the trial.
  • User study. The study includes recruiting and screening 5-10 participants. EDI will be considered an important part of the recruitment. The study will be carried out by a trained therapist. The study will assess the feasibility of the solution and its pathway for adoption by the health services. We will collect feedback from patients and therapists at partner institutions.
  • Result analysis and dissemination. We will seek public involvement (PPI) to help the team to interpret the data. We will gather public opinions on our VR digital health innovation.

Evidence from the study will inform a commercialisation strategy for the effective delivery of services to patients, working within the NHS delivery structures, and maximising the number of patients that benefit from this work.

Marker-based multi-camera extrinsic calibration for 3D body tracking

One of the main use cases of our metaverse lab is 3d body tracking. With Kinect DK’s SDK, 32 body joints can be detected or estimated from a single camera feed. The data for each joint include 3d coordinates (x, y, z) in the depth camera’s coordinate system, rotation matrix in quaternion (qw, qx, qy, qz), and a confidence metric. More details can be found in the SDK.

The results are already pretty good for application scenarios where there is a single subject and the person is facing the camera. Once there are multiple subjects in the scene or when the subject makes significant body movements, parts of the bodies are likely to be obstructed in the camera’s view. Although the SDK will still return data for all 32 joints, the estimated joint positions are often quite bad and should not be used for research. Another problem of the single camera tracking is the limited area coverage. Tracking performance art or sports activities would be difficult.

same subject – blue: camera 1, purple: camera 2

One solution is to simply add more cameras. Because each camera uses itself as the reference point to express the location of any object it sees, the same object will get different location readings from all cameras. For instance, the images above show data from 2 cameras of a single subject. Therefore we need to calibrate the data feeds from all cameras. This is normally done by transforming data from one coordinate system e.g., a secondary camera to a reference coordinate system e.g., the master camera. Ideally, the process will reshape the blue figure in the image above to match the shape of the purple figure exactly or vice versa. The transformation itself is straightforward using matrix multiplication but some work is needed to derive the transformation matrix between each camera pair. Luckily, OpenCV already includes a function estimateAffine3D() which computes an optimal affine transformation between two 3D point sets. So our main task is to get the associated 3D point sets from the 2 cameras. The easiest option to get the point sets is to reuse the 32 joint coordinates from cameras since they are tracking the same subject.

Feeding the joint coordinates to estimateAffine3D() will result in the above transformation matrix in homogeneous coordinates. I eliminated all low confidence joints to reduce the noise. In this case, the matrix is designed to transform readings from device 1 to device 2. The image below shows how the blue figure is mapped to the coordinate system of the purple figure. The result is nearly perfect from the chest above. The lower body is not great because they are not captured directly by our cameras.

Using body joints as markers for camera calibrations is promising but our results also clearly show some major issues: we can’t really trust the joint readings for accurate calibration. At the end of the day, the initial argument of the project was that each camera may have an obstructed view of the body joints. To find more reliable markers, I am again borrowing ideas from computer vision field: ChArUco.

ArUco are binary square fiducial markers commonly used for camera pose estimation in computer vision and robotics. Using OpenCV’s aruco library, once can create a set of markers by defining marker size and dictionary size. the same library can be use to detect markers and their corners (x, y coordinates in a 2D image). The marker size determines the information fidelity, i.e., how many different markers is allowed. Each marker has its own ID for identification when multiple markers are present. The maximum dictionary size is therefore determined by the marker size but normally a much smaller dictionary size is chosen to increase the inter-marker differences. ChArUco is a combination of ArUco and chessboard to take advantage of ArUco’s fast detection and the more accurate corner detection permitted by the high contrast chessboard pattern. For my application scenario, ArUco’s corner detection seems accurate enough so ChArUco is only used to better match ChArUco boards on the front and back of a paper (more explanations below). The image below is a 3 by 5 ChArUco board with 7 ArUco makers (marker size 5 by 5 and dictionary size 250). This particular board has markers with the ID from 0 to 6.

The idea is now to print out this ChArUco board on a piece of paper and let both cameras detect all marker corners for calibration. So I fire up the colour camera of Kinect DK and get the following result. Yes, I am holding the paper upside down but that’s ok.

ChArUco marker detection on camera 2

With 28 reference points from each camera, the next step is to repeat what was done on the 32 body joints and generate a new transformation matrix. However, additional step is needed. The marker detection was done using the colour camera because the depth camera could only see a flat surface and no markers. So all the marker coordinates are in the colour camera’s 2D coordinate system, i.e., all the red markers points in above image are flat with no depth. These points are then mapped to the depth camera’s 3D coordinate system using Kinect DK SDK’s transformation and calibration functions.

https://learn.microsoft.com/en-us/azure/kinect-dk/use-calibration-functions

I am still looking for a better option but here are the 2-step approach:

Firstly, all marker points are transformed from 2D colour space to 2D depth space as seem above (marker super-imposed on depth image). Knowing the locations of the markers on the depth image allowed me to find the depth information for all markers.

Next, markers are transformed from 2d depth space to 3d depth space to match the coordinate system for the body joints data. The images above show both makers and joints.

With the new sets of marker points a new transformation can be made from one camera to the other. All marker points are correctly mapped compared with data from the other camera. The joints are for reference only. I do not expect the joints to be mapped perfectly because they are mostly obstructed by the desk or the ChArUco board. I also eliminated a lot of small details here such many functions to keep the solution robust when some markers are blocked or failed to be recognised for any reason. Needless to say, this is only an early evaluation using a single ChArUco board on an A4 paper. I will certainly experiment with multiple boards while they are strategically positioned and board of different configurations.

Before I take this prototype out of the lab for a more extensive evaluation, there is another problem to solve. The current solution relies on both cameras having a good view of the same markers. This is fine only when the two cameras are not far apart. If we were to have 2 cameras diametrically opposed to each other to capture a subject from front and back, then it is very hard to place a ChArUco card viewable by both cameras. It would probably have to be on the floor while both cameras are tilted downwards. To solve this issue, I borrowed the idea from CWIPC‘s 2-sided calibration card.

This 2-sided card has a standard ChArUco board on each side. The image above shows one side with marker ID 0 to 6 and the other side with marker ID 10 to 16. The corners of each marker on one side are aligned with a corresponding marker on the other side. So marker corners on one side are practically identical to marker corners detected by an different camera on the other side (with the error of the paper thickness that can be offset if necessary). A custom mapping function is developed to synchronise markers reported by cameras on each side of the paper. For instance, marker ID 0, 1, 2 are mapped to marker ID 12, 11, 15. The corner point order should also be changed so that all 28 points will be in the correct order on both sides. This approach requires some hard coding for each 2-sided card so I am hoping to automate this process in the future.

The following images show a test where I place this card between 2 cameras.

The transformation result is shown below. The solution is now also adaptive to detect whether multiple cameras are viewing the same side of the card or different sides of the card, and active different transformation options accordingly.

Overall, this is a simple and light-weight solution for multi-camera body tracking when the requirements for extrinsic calibration are not as significant as those of volumetric capturing. The next step for this project is real-world evaluation with selected use cases. There are still a lot of improvements to be made especially the automation and robustness of the detection and calibration.

[Publication] Unstuck in Metaverse: Persuasive User Navigation Using Automated Avatars

Mu, M., Dohan, M. “Unstuck in Metaverse: Persuasive User Navigation using Automated Avatars”, to appear in IEEE Communications Magazine, IEEE, 2023

Were you ever lost in a new place that you are visiting? What do you do when that happens? In an established and populous area, Google Maps or asking someone for directions may be the best choice. In rural locations, experienced mountaineers would use surroundings such as terrain features to track where they are.

Now, how about getting lost in VR? As the metaverse (large-scale virtual environments) become increasingly grander and more complex, it is inevitable that VR users will find themselves disoriented and effectively getting stuck in a strange corner of the virtual space. Research has shown that humans must plan their movements with sufficient specialist knowledge to navigate successfully. In the metaverse, users may not always be willing to spend the time to develop the required spatial knowledge. If the navigation support provided by user interfaces of VEs is insufficient, people will become disoriented when there is no vantage point from which the entire world can be seen in detail. Other research has also shown that VR users are susceptible to disorientation, particularly when using locomotion interfaces that lack self-motion cues. This is often caused by the confusion between the visual sense and other bodily senses while viewing an augmented or virtual reality world through a head-mounted display (HMD) which is not synchronized to real-world movements.

unstuck in the MMO game WOW (https://forum.turtle-wow.org/viewtopic.php?t=1628)

We clearly observed instances of user disorientation in our previous VR experiment involving large-scale abstract VR paintings, and we are determined to develop an unstuck feature to support user navigation in the metaverse. The term unstuck stemmed from the user function offered in open-world computer games such as World of Warcraft and New World. The function allows players to be freed from irreconcilable situations when their in-game characters could not move or interact with the virtual environments due to software bugs, graphics glitches or connection issues.

The plan is to design an unstuck feature that can develop itself organically and does not require human insertion of waypoints, routes, etc. This can be achieved by observing and modelling how the virtual space is used by users (community activities). For instance, we could comfortably identify a walkable path between location A and B because a number of different users moved from A to B in similar ways. The same principle can be applied to the entire virtual space so our model can learn: 1) all the possible paths discovered by users, and 2) how users navigate using these paths. The model then can make inferences of where a “normal” user would go (i.e., the next path they would use) based on where they have been. For new users, the inferences are used as recommendations for their next move. Once a user makes a new move (whether they pick an of the recommendations or not), their history of movement updates and new recommendations will be generated. The idea is very similar to some language models: by studying how humans construct sentences, a machine learning model can look at a prompt (a few leading words) and predict what the next word would be, hence gradually generating an entire sentence.

unstuck feature

Before we apply any time-series machine learning, there are a few things to sort out. I mentioned location A and B as examples but in the metaverse, there might not be any pre-defined landmarks and generally speaking it is not a good idea to arbitrarily set up some. An easy solution would be a grid system with uniformly distributed waypoints but it would mean that some popular areas won’t have enough waypoints to capture different paths and some deserted areas would have too many waypoints for no reason. The density and distribution of the location waypoints should roughly match how an area is accessed by users. The solution we came up with was simply clustering the user locations we observed from 35 users while considering the centroid locations and the size of each cluster.

clustering of user locations in VR
User movements across clusters (waypoints)

The next step is the easy part. We used a moving window to take a series of 5 consecutive steps for each user’s movements. The idea is to use the first four steps to predict the fifth step. We tried a classical feedforward network where the order of the input data is not considered and an LSTM-based network which considers the data as time series. Needless to say, the LSTM shows better accuracy in all metrics we employed. A further improvement was made when we added the location information to the input data. This means the model is aware of the ID of each location in input data and where they are (coordinates). The top1 accuracy is around 0.7 and the top 2 accuracy is around 0.9. This is pretty good for a 30-class classifier using a lightweight RNN architecture.

ground truth (left) and ML prediction (right)

The next step was to determine how the ML outcomes are communicated to the users in VR applications. A related work (https://ieeexplore.ieee.org/document/9756757) studied the effectiveness of 10 types of user navigation instructions in mixed reality setups. Arrows and avatars were the most preferred methods “due to their simplicity, clarity, saliency, and informativeness.” In their study, the arrows are “an array of consecutive arrows on the ground” and the avatars are “a humanoid figure resembling a tour guide”.

Navigation instructions compared in user study (https://ieeexplore.ieee.org/document/9756757)

We chose Arrows and Avatars as the two navigation methods for a comparative study. For the arrow method, the conventional choice of superimposing arrows on the ground would not work because there is no defined path in our virtual environment and user’s view of the ground is often obstructed by artwork at the waist level. We went for semi-transparent overhead arrows which are more likely to be in sight. They do slightly block the users’ view at a particular angle. Users can see through the arrows and no one has complained about them but we do need to explore different designs in the future. The avatar method is more successful than how we anticipated. Three avatars spawn in the virtual environment as “quiet visitors”. Each avatar takes one of the Top 3 recommendations from the ML model and travels in the recommended direction. They then change their directions when new recommendations are given, normally when the human user makes a substantial location change (i.e., reach a new waypoint).

The avatars are configured to be shorter than the average adult height to keep them less intimidating. They do not interact with human users and their role is implicitly persuading users to investigate recommended areas of the artwork. We use cartoonish virtual characters instead of more realistic ones as they are more generally acceptable (Valentin Schwind, Katrin Wolf, and Niels Henze. 2018. Avoiding the uncanny valley in virtual character design. interactions 25, 5 (September-October 2018), 45–49. https://doi.org/10.1145/3236673). We thought about adding head and eye movements but decided to leave it for future investigation due to concerns that these features might look too creepy.

The figure above shows data from participant iys who self-reported during the experiment that he was following avatars “lady” and “claire”. The participant started his exploration by walking into the artwork in a straight line. He then stood in one place for a while and asked where he should go before deciding by himself to follow the avatars and eventually making a counter-clockwise circular walk to experience the artwork. The circular path correlates with the similar counter-clockwise circular walk made by avatar “claire”. We also used some quantitative measurements such as walk distance (WD) to compare how users’ movements have been affected by the two guidance methods. We noticed that users do walk for longer distances and explore wider areas when arrows and avatars are enabled, though the differences may not be statistically significant. The paper also includes further analysis using eye gaze data to evaluate how users engage with the navigation feature.

There is still so much to do on this research topic but I am quite pleased to see another close-the-loop project where we started everything from scratch, completed prototyping, data collection, and machine learning modelling, then put the results back in the application to evaluate its effectiveness.

Smart campus data visualisation “zoo”

[UPDATE] I stopped the AWS instance due to the cost. They are now “on-demand”.

Finally got a bit of time to move the code and some sample data to a public server. The data are all real but not live. Also, I am using an old version of Highcharts.js. They are sitting on tiny t3.micro so be gentle.

data volume
area heatmap
floor activities (wait a few seconds for data to load)
crowd distribution dial
cross-area movements dependency wheel

crowd distribution streamgraph

area heatmap on floor plan

Lastly, the monstrous scatterplot shows how each device moves on a day. This may freeze your browser. Wait >20 seconds for data to load and DO NOT refresh the page.

device movements scatter plot (wait many seconds for data to load and DO NOT refresh)

Treating mental illness in the comfort of an English cottage garden

The SIRIPP project team has made significant progress in our VR therapy project and successfully delivered our first VR prototype! It’s really amazing to see how things have come together. Developing a 3D VR game with professionally generated content and intuitive user interactivities is already a great achievement. We managed to achieve this while integrating nearly the entire clinical procedure of a well-established mental health treatment. With the help of our partners Cardinal Clinic and St Andrew’s Healthcare, the game designer Andy Debus, game developer Murtada Dohan and I spent a lot of time reading and understanding the clinical procedure and experimenting with different game mechanics and visual objects in order to keep the whole VR experience playful and clinically effective at the same time. I believe this is a leap ahead from the “conventional” VR-based exposure therapy.

The “game” is set in a bespoke English cottage garden. Patients start their journey by finding themselves sitting on a bench in the back garden. It is a safe place where they can simply relax and be introduced to the therapy. This is also where clinical preparations are completed through some simple gameplay. The main procedures of the treatment are carried out in the cottage. Each room of the cottage will be configured differently to accommodate different preferences or stages of the treatment (We will show the treatment rooms in the future). It might not look like so, but many virtual objects in the scene are interactive and genuinely carry the functions of patient interactions/feedback critical for assessment and treatment.

VR treatment reviewed by a psychologist

While patients perform game tasks, the VR headset tracks a variety of data including eye gaze, gestures, body movements, facial expressions, etc. Our in-game features can respond to the live data feed and help improve the effectiveness of the treatment and potentially protect patients from overwhelming emotions while they process life events. The data will also provide psychologists with some insights into the patient’s responses to the treatment which wouldn’t be available using conventional methods.

More updates to come soon. Feel free to contact me if you’d like to volunteer for our upcoming clinical trials.

Metaverse Lab – volumetric / motion capturing and streaming

I’ve led a successful Research Capital Fund at UON to help the university invest in key areas that can extend its research and innovation impact leading to the next REF submission. The Fund will support the first-phase development of a Metaverse Lab for health services, education, training, and industrial innovations.

The Metaverse Lab will address the single biggest challenge of VR/XR work at the university: many colleagues who wanted to experiment with immersive technologies for teaching and research simply didn’t have the resources and technical know-how to set up the technology for their work. We’ve witnessed how this technical barrier has blocked many great ideas from further development. My aim is to build an environment where researchers can simply walk into the Lab and start experimenting with the technologies, conducting user experiments, and collecting research-grade data.

Volumetric capturing using multiple Kinect DK (k4a) RGB-D cameras

The Lab includes an end-to-end solution, from content generation to distribution and consumption. At the centre of the Metaverse Lab sits an audio-visual volumetric capturing system with 8 RGB-depth cameras and microphones. This will allow us to seamlessly link virtual and physical environments for complex interactive tasks. The capturing system will link up with our content processing and network emulation toolkit to prepare the raw data for different use scenarios such as online multiparty interaction. Needless to say, artificial intelligence will be an important part of the system for optimisation and data-driven designs. There will be dedicated VR/XR headsets added to our arsenal to close the loop.

The two screen recordings below show the 3D volumetric capturing of human subjects using 4 calibrated cameras. This particular demo was developed based on cwipc – CWI Point Clouds software suite. The cameras are diagonally placed to cover all viewing angles of the subjects. This means that you can change your view by moving around the subject. The cameras complement each other while the view from one camera is obstructed. One of the main advantages of such live capturing systems is its flexibility. No objects need to be scanned in advance and you can simply walk into the recording area and bring any object with you.

Single-subject volumetric capturing using 4 camera feeds.
Volumetric capturing of 2 subjects using 4 camera feeds.
Depth camera view

The system can be used for motion capturing using the Kinect’s Body Tracking SDK. With 32 tracked joints, human activities and social behaviour can be analysed. The following two demos show two scenes that I created based on live tracking of human activities. The first one shows two children playing. The blue child tickles the red child while the red child holds her arms together, turns her body and moves away. The second scene is an adult doing pull-ups. The triangle on the subject’s face marks their eyes and nose. The two isolated marker points near the eyes are the ears.

“Two children playing”
“Pull ups”

We envisage multiple impact areas including computational psychiatry (VR health assessment and therapies), professional training (policing, nursing, engineering, etc.), arts and performance, social science (e.g., ethical challenges in Metaverse), esports (video gaming industry), etc. We also look forward to expanding our external partnerships with industrial collaborations, business applications, etc.

Paper on data-driven smart communities to appear in IEEE Network Magazine

A smart campus project started in 2019 finally sees its first academic paper titled “Network as a sensor for smart crowd analysis and service improvement” appear in a Smart Communities special issue of IEEE Network Magazine. It was meant to be a pure engineering project to showcase the potential of campus WiFi data for service optimisation and automation but it quickly became a data science project too when we started to gather and process hundreds of millions of anonymised connectivity data. In summary, we monitor how connected devices switch between WiFi APs and use machine learning to model crowd behaviours for predictive analysis, anomaly detection, etc. Comparing with conventional crowd analysis solutions based on video cameras or WiFi probing. our solution is less intrusive and does not require the installation of additional equipment. Our SDN infrastructure is the icing on the cake as it offers a single point for data aggregation.

Abstract:

With the growing availability of data processing and machine learning infrastructures, crowd analysis is becoming an important tool to tackle economic, social, and environmental challenges in smart communities. The heterogeneous crowd movement data captured by IoT solutions can inform policy-making and quick responses to community events or incidents. However, conventional crowd-monitoring techniques using video cameras and facial recognition are intrusive to everyday life. This article introduces a novel non-intrusive crowd monitoring solution which uses 1,500+ software-defined networks (SDN) assisted WiFi access points as 24/7 sensors to monitor and analyze crowd information. Prototypes and crowd behavior models have been developed using over 900 million WiFi records captured on a university campus. We use a range of data visualization and time-series data analysis tools to uncover complex and dynamic patterns in large-scale crowd data. The results can greatly benefit organizations and individuals in smart communities for data-driven service improvement.

An associated dataset that includes over 300 million records of WiFi access data is available at: https://bit.ly/3Dmi6X1.

Automating mental health treatment

Today marks the start of a new research project on automating mental health treatment using VR and game design. This short project is funded by the University’s Support for Innovation and Research Ideas, Policy and Participation (SIRIPP) grant. The SIRIPP grant supports staff in developing their idea and activity and helps progress to further external funding and support routes.

The project aims to prototype a VR-based mental health treatment solution for internalising disorders that can be administered by patients at home. The solution must be effective, fun, trustworthy, and secure. To achieve this goal, we’ll need to find ways for innovations from human-computer interaction, game design, psychology and artificial intelligence to work together and synergise.

An eye-gaze controlled virtual game prototype for mental health treatment (Developed by Murtada Dohan and Andrew Debus. All Rights Reserved)

The project is led by:

  • Mu Mu (HCI and Data Science), Faculty of Arts, Science and Technology, UON
  • Jacqueline Parkes (Applied Mental Health), Faculty of Health, Education and Society, UON
  • Andrew Debus (Game Design), Faculty of Arts, Science and Technology, UON
  • Kieran Breen (Psychology), Head of Research and Innovation, St Andrew’s Healthcare
  • Paul Wallang (Psychology), Director of Innovation and Improvement, Cardinal Clinic

The main objectives of the project are:

  • Develop research protocols and ethics guidelines for automated VR treatment.
  • Prototype a VR game with interactive tasks that mimic manualised psychotherapy treatment.
  • Conduct small-scale user trials and capture research-grade data to support follow-on projects
  • Expand our network of collaborators (communities, academics, businesses, policymakers, etc.)

Feel free to contact me (mu.mu@northampton.ac.uk) if you wish to know more about our project.