[Publication] Unstuck in Metaverse: Persuasive User Navigation Using Automated Avatars

Mu, M., Dohan, M. “Unstuck in Metaverse: Persuasive User Navigation using Automated Avatars”, to appear in IEEE Communications Magazine, IEEE, 2023

Were you ever lost in a new place that you are visiting? What do you do when that happens? In an established and populous area, Google Maps or asking someone for directions may be the best choice. In rural locations, experienced mountaineers would use surroundings such as terrain features to track where they are.

Now, how about getting lost in VR? As the metaverse (large-scale virtual environments) become increasingly grander and more complex, it is inevitable that VR users will find themselves disoriented and effectively getting stuck in a strange corner of the virtual space. Research has shown that humans must plan their movements with sufficient specialist knowledge to navigate successfully. In the metaverse, users may not always be willing to spend the time to develop the required spatial knowledge. If the navigation support provided by user interfaces of VEs is insufficient, people will become disoriented when there is no vantage point from which the entire world can be seen in detail. Other research has also shown that VR users are susceptible to disorientation, particularly when using locomotion interfaces that lack self-motion cues. This is often caused by the confusion between the visual sense and other bodily senses while viewing an augmented or virtual reality world through a head-mounted display (HMD) which is not synchronized to real-world movements.

unstuck in the MMO game WOW (https://forum.turtle-wow.org/viewtopic.php?t=1628)

We clearly observed instances of user disorientation in our previous VR experiment involving large-scale abstract VR paintings, and we are determined to develop an unstuck feature to support user navigation in the metaverse. The term unstuck stemmed from the user function offered in open-world computer games such as World of Warcraft and New World. The function allows players to be freed from irreconcilable situations when their in-game characters could not move or interact with the virtual environments due to software bugs, graphics glitches or connection issues.

The plan is to design an unstuck feature that can develop itself organically and does not require human insertion of waypoints, routes, etc. This can be achieved by observing and modelling how the virtual space is used by users (community activities). For instance, we could comfortably identify a walkable path between location A and B because a number of different users moved from A to B in similar ways. The same principle can be applied to the entire virtual space so our model can learn: 1) all the possible paths discovered by users, and 2) how users navigate using these paths. The model then can make inferences of where a “normal” user would go (i.e., the next path they would use) based on where they have been. For new users, the inferences are used as recommendations for their next move. Once a user makes a new move (whether they pick an of the recommendations or not), their history of movement updates and new recommendations will be generated. The idea is very similar to some language models: by studying how humans construct sentences, a machine learning model can look at a prompt (a few leading words) and predict what the next word would be, hence gradually generating an entire sentence.

unstuck feature

Before we apply any time-series machine learning, there are a few things to sort out. I mentioned location A and B as examples but in the metaverse, there might not be any pre-defined landmarks and generally speaking it is not a good idea to arbitrarily set up some. An easy solution would be a grid system with uniformly distributed waypoints but it would mean that some popular areas won’t have enough waypoints to capture different paths and some deserted areas would have too many waypoints for no reason. The density and distribution of the location waypoints should roughly match how an area is accessed by users. The solution we came up with was simply clustering the user locations we observed from 35 users while considering the centroid locations and the size of each cluster.

clustering of user locations in VR
User movements across clusters (waypoints)

The next step is the easy part. We used a moving window to take a series of 5 consecutive steps for each user’s movements. The idea is to use the first four steps to predict the fifth step. We tried a classical feedforward network where the order of the input data is not considered and an LSTM-based network which considers the data as time series. Needless to say, the LSTM shows better accuracy in all metrics we employed. A further improvement was made when we added the location information to the input data. This means the model is aware of the ID of each location in input data and where they are (coordinates). The top1 accuracy is around 0.7 and the top 2 accuracy is around 0.9. This is pretty good for a 30-class classifier using a lightweight RNN architecture.

ground truth (left) and ML prediction (right)

The next step was to determine how the ML outcomes are communicated to the users in VR applications. A related work (https://ieeexplore.ieee.org/document/9756757) studied the effectiveness of 10 types of user navigation instructions in mixed reality setups. Arrows and avatars were the most preferred methods “due to their simplicity, clarity, saliency, and informativeness.” In their study, the arrows are “an array of consecutive arrows on the ground” and the avatars are “a humanoid figure resembling a tour guide”.

Navigation instructions compared in user study (https://ieeexplore.ieee.org/document/9756757)

We chose Arrows and Avatars as the two navigation methods for a comparative study. For the arrow method, the conventional choice of superimposing arrows on the ground would not work because there is no defined path in our virtual environment and user’s view of the ground is often obstructed by artwork at the waist level. We went for semi-transparent overhead arrows which are more likely to be in sight. They do slightly block the users’ view at a particular angle. Users can see through the arrows and no one has complained about them but we do need to explore different designs in the future. The avatar method is more successful than how we anticipated. Three avatars spawn in the virtual environment as “quiet visitors”. Each avatar takes one of the Top 3 recommendations from the ML model and travels in the recommended direction. They then change their directions when new recommendations are given, normally when the human user makes a substantial location change (i.e., reach a new waypoint).

The avatars are configured to be shorter than the average adult height to keep them less intimidating. They do not interact with human users and their role is implicitly persuading users to investigate recommended areas of the artwork. We use cartoonish virtual characters instead of more realistic ones as they are more generally acceptable (Valentin Schwind, Katrin Wolf, and Niels Henze. 2018. Avoiding the uncanny valley in virtual character design. interactions 25, 5 (September-October 2018), 45–49. https://doi.org/10.1145/3236673). We thought about adding head and eye movements but decided to leave it for future investigation due to concerns that these features might look too creepy.

The figure above shows data from participant iys who self-reported during the experiment that he was following avatars “lady” and “claire”. The participant started his exploration by walking into the artwork in a straight line. He then stood in one place for a while and asked where he should go before deciding by himself to follow the avatars and eventually making a counter-clockwise circular walk to experience the artwork. The circular path correlates with the similar counter-clockwise circular walk made by avatar “claire”. We also used some quantitative measurements such as walk distance (WD) to compare how users’ movements have been affected by the two guidance methods. We noticed that users do walk for longer distances and explore wider areas when arrows and avatars are enabled, though the differences may not be statistically significant. The paper also includes further analysis using eye gaze data to evaluate how users engage with the navigation feature.

There is still so much to do on this research topic but I am quite pleased to see another close-the-loop project where we started everything from scratch, completed prototyping, data collection, and machine learning modelling, then put the results back in the application to evaluate its effectiveness.

One thought on “[Publication] Unstuck in Metaverse: Persuasive User Navigation Using Automated Avatars

Leave a comment