I’ve led a successful Research Capital Fund at UON to help the university invest in key areas that can extend its research and innovation impact leading to the next REF submission.
[insert fancy marketing images of equipment and hardworking researchers here]
The Fund will support the first-phase development of a Metaverse Lab for health services, education, training, and industrial innovations within but not limited to the Faculty of Arts, Science and Technology (FAST), Centre for Advanced and Smart Technologies (CAST), and Centre for Active Digital Education (CADE).
The Metaverse Lab will address the single biggest challenge of VR/XR work at the university: many colleagues who want to experiment with immersive technologies for teaching and research simply didn’t have the resources and technical skills to set up the technology for their work. We’ve witnessed how this technical barrier has blocked so many great ideas from being further developed. My aim is to build an environment where researchers can simply walk into the Lab and start experimenting with the technologies, conducting user experiments, and collecting research-grade data.
The Lab will include an end-to-end solution, from content generation to distribution and consumption. At the centre of the Metaverse Lab sits an audio-visual volumetric capturing system with several RGB-depth cameras and microphones. This will allow us to seamlessly link virtual and physical environment for complex interactive tasks. The capturing system will link up with our content processing and network emulation toolkit to prepare the raw data for different use scenarios such as online multiparty interaction. Needless to say, artificial intelligence will be an important part of the system for optimisation and data-driven designs. There will be dedicated VR/XR headsets added to our arsenal to close the loop.
We envisage multiple impact areas including computational psychiatry (VR health assessment and therapies), professional training (policing, nursing, engineering, etc.), arts and performance (UKRI just announced a new framework “Enter the metaverse: Investment into UK creative industries”), social science (e.g., ethical challenges in Metaverse), esports (video gaming industry), etc. We are also looking forward to expanding our external partnerships with industrial collaborations, business support, etc.
A smart campus project started in 2019 finally sees its first academic paper titled “Network as a sensor for smart crowd analysis and service improvement” appear in a Smart Communities special issue of IEEE Network Magazine. It was meant to be a pure engineering project to showcase the potential of campus WiFi data for service optimisation and automation but it quickly became a data science project too when we started to gather and process hundreds of millions of anonymised connectivity data. In summary, we monitor how connected devices switch between WiFi APs and use machine learning to model crowd behaviours for predictive analysis, anomaly detection, etc. Comparing with conventional crowd analysis solutions based on video cameras or WiFi probing. our solution is less intrusive and does not require the installation of additional equipment. Our SDN infrastructure is the icing on the cake as it offers a single point for data aggregation.
With the growing availability of data processing and machine learning infrastructures, crowd analysis is becoming an important tool to tackle economic, social, and environmental challenges in smart communities. The heterogeneous crowd movement data captured by IoT solutions can inform policy-making and quick responses to community events or incidents. However, conventional crowd-monitoring techniques using video cameras and facial recognition are intrusive to everyday life. This article introduces a novel non-intrusive crowd monitoring solution which uses 1,500+ software-defined networks (SDN) assisted WiFi access points as 24/7 sensors to monitor and analyze crowd information. Prototypes and crowd behavior models have been developed using over 900 million WiFi records captured on a university campus. We use a range of data visualization and time-series data analysis tools to uncover complex and dynamic patterns in large-scale crowd data. The results can greatly benefit organizations and individuals in smart communities for data-driven service improvement.
An associated dataset that includes over 300 million records of WiFi access data is available at: https://bit.ly/3Dmi6X1.
Today marks the start of a new research project on automating mental health treatment using VR and game design. This short project is funded by the University’s Support for Innovation and Research Ideas, Policy and Participation (SIRIPP) grant. The SIRIPP grant supports staff in developing their idea and activity and helps progress to further external funding and support routes.
The project aims to prototype a VR-based mental health treatment solution for internalising disorders that can be administered by patients at home. The solution must be effective, fun, trustworthy, and secure. To achieve this goal, we’ll need to find ways for innovations from human-computer interaction, game design, psychology and artificial intelligence to work together and synergise.
The project is led by:
Mu Mu (HCI and Data Science), Faculty of Arts, Science and Technology, UON
Jacqueline Parkes (Applied Mental Health), Faculty of Health, Education and Society, UON
Andrew Debus (Game Design), Faculty of Arts, Science and Technology, UON
Kieran Breen (Psychology), Head of Research and Innovation, St Andrew’s Healthcare
Paul Wallang (Psychology), Director of Innovation and Improvement, Cardinal Clinic
The main objectives of the project are:
Develop research protocols and ethics guidelines for automated VR treatment.
Prototype a VR game with interactive tasks that mimic manualised psychotherapy treatment.
Conduct small-scale user trials and capture research-grade data to support follow-on projects
Expand our network of collaborators (communities, academics, businesses, policymakers, etc.)
Feel free to contact me (firstname.lastname@example.org) if you wish to know more about our project.
It was a pleasure to be invited by Prof. Eduardo Cerqueira to meet with his postgraduate students at the Institute of Technology, Federal University of Pará. We had some interesting discussions on mobile VR, content distribution, and AI ethics in VR designs.
In the past few years, we have had a series of projects on capturing and modelling human attention in VR applications. Our research shows that eye gaze and body movements share a pivotal role in capturing human perception, intent, and experience. We truly believe that VR is not just another computerised environment with fancy graphics. With the help of biometric sensors and machine learning, VR can become the best persuasive technology known to HCI designers. In a recent project, we demonstrated how machine learning can be automated to study visitor behaviours in a VR art exhibition without any prior knowledge of the artwork. The resultant model then drives autonomous avatars (see below) to guide other visitors based on their eye gaze and mobility patterns. With the “AI avatars”, we observed a significant increase in visitors’ interactions with the VR artwork and very positive feedback on the overall user experience.
The COVID-19 pandemic and its prolonged impact on health services made us rethink our research priorities. While we are still enthusiastic about digital arts, we wanted to make good use of our VR and data science know-how for healthcare innovations. Using VR and AI in healthcare is not a new idea. There are already tons of existing research on VR-based therapies, especially for the treatment of phobia and dementia. AI has been used to develop chatbots, to detect COVID-19 symptoms, etc. The research we’ve seen so far are very promising from an academic perspective but most of them aim at augmenting traditional practices for improved outcomes. This means that any developed application will still need to be operated by a technician in a controlled setting. Recognising the healthcare innovations in the research communities, we are interested in a new form of design that can deliver automated or even autonomous assessment and treatment of diseases in a remote location, e.g., patients’ own home or an easily accessible community centre. This will ultimately help reduce the amount of health care appointments and patients’ trips to hospitals.
The pandemic has added long-lasting impacts on public mental health due to social isolation, loss of coping mechanisms, reduced access to health services, etc. We believe VR and AI research should see a major shift from exploratory proof-of-concept to product-focused development with wider public engagement. Just like how every Tesla car and every Google search improves their underlying ML models, mental health innovation must aim at large scale user trials to achieve any major transformation. To this end, we now pair with the R&D department of a leading mental health institution to engineer new VR applications for new adventures. We hope that customised VR stimuli and NLP dialogue engines will lead to more effective treatment that was not possible in the past due to constraints in the physical world. We are also quite excited about the opportunities to automate the assessment of mental disorders through biometric sensors and machine learning.
This is a belated post on developing a new BSc AI and Data Science (Hons) programme. This programme has successfully passed validation in early 2022 and we are now accepting applications for the 22/23 academic year.
The development of the new programme is an answer to the growing demand for machine learning engineers and scientists in the UK job market. Using AI and machine learning to increase productivity, save cost, and assist new designs is no longer a privilege for large tech companies and government organisations. In the past few years, we have worked with many small and micro-businesses that are enthusiastic about adopting AI techniques and recruiting AI talents. Although we have been teaching AI-related topics such as computer vision, deep learning and graph databases within our existing programmes for many years, it is now imperative to design a dedicated BSc programme to capture recent advancements in AI as well as the legal, ethical, and environmental challenges that may follow. I am pleased to have the chance to be part of this development as the programme lead.
We had two parallel procedures taking place: Computing market research and CAIeRO Planner. The market research was carried out by key academics who are currently teaching AI-related modules. We did a few case studies of similar programmes offered by our main competitors and current job vacancies for ML engineers, researchers, and data analysts. We noticed that a lot of AI programmes are offered as a collection of discrete data science and machine learning modules that don’t synergise with each other. While this setting may give prospective students the impression of a rich and sophisticated course, students do not get the best value while hopping between those modules. We wanted to follow the theme of responsible and human-centred AI while providing a clear path to success and a sense of accomplishment along the way. The research on the job market was especially important because we wanted to continuously champion hands-on learning and practical skills. This practice gave us a general idea of the toolset, frameworks, workflow, and R&D environment that our students will be expected to master in their future workplace.
Planning on the technical content is only half of the story. The University has a large and dedicated Learning Technology team to support any activities on the module and programme development and improvement. We had two learning technologists assigned to our programme to support detailed designs at both programme and module levels. We used an in-house planner Creating Aligned Interactive educational Resource Opportunities (CAIeRO) to guide the exercises.
We started with the “look and feel”, learning outcomes, mission statement and assessment strategy for the programme as a whole using interactive tools and sharable environments such as padlet. All members of the programme team had equal inputs to the design. The whole process was carried out through multiple online sessions over a few weeks. Because everyone came to the meeting fully prepared, the sessions were really effective and super engaging. The programme level design then became the blueprint for module-level designs to ensure coherence and consistency across all modules.
We then identified four new modules for the programme: Mathematics for Computer Science, Introduction to AI, Natural Language Processing, and Cloud Computing and Big Data. We also reworked some existing modules such as Advanced AI and Applications, and Media Technology to better accommodate the programme learning outcomes.
Developing module-level learning outcomes can be challenging, especially when we need to maintain the coherence between modules at the same level. As student-facing documents, the module specifications also need to be clear and concise. We used a toolkit called COGS which stands for Changemaker Outcomes for Graduate Success. It includes a series of guidelines that help staff write clear and robust learning outcomes that are appropriate to the academic level of study in order to clarify for students what is expected of them across the different stages of their study. I found this tool extremely useful when I developed the new modules, knowing that my colleagues would be using similar languages for the related modules.
We also took a few extra steps to make sure that the learning outcomes will be assessed using a range of tools including assignment, project, time-constrained assessment and dissertation. Most modules also offer a mix of face-to-face and a small number of online contact hours for active and blended learning. This will allow students to work on subject tasks online before they join the classes, a practice that could greatly improve student engagement.
If you are interested in more details about our programme, please don’t hesitate to contact me.
In Part 4, I made a start with establishing a new training dataset by harvesting publicly accessible photos on social media. The main benefit of using user generated content is that they were taken in a real-world setting, hence close to what the targeting logos would look like in a film. For content selection and labelling, my own filtering tool and Yolo_Mark worked pretty well. It wasn’t easy to label 600+ images but the workflow is decent. The three classes are: 0 – Cadbury, 1 – ROSES, and 2 – HEROES. There are some typeface variations of ROSES. You need to be patient and consistent of the labelling strategy. As humans, we are able to acquire information from different sources very quickly while making a decision. So if I were actively looking for a particular logo while knowing the logo is definitely present, I could still point at an unidentifiable blob of pixels and be 100% certain that its a Cadbury logo on a discarded purple wrapper. It may not be realistic to expect a “low-level” machine learning model with a small training set to capture what human could do in this case. Therefore I limit the labelling to only the logos that I could visually identify directly.
The training process wasn’t much different from the previous modelling for Coca-cola logo except some further tidying of the dataset (minor issues with missing files, etc.). With a baseline configuration, it took about 6 hours to complete 6000 epochs with a pretty good result base on the detection of three logos.
The images below illustrate what the model picks up from some standard photos (using the slider to see “before” and “after”).
I’ve also tested the model on some videos provided by our partner. I won’t be able to show it here due to copyrights but its safe to say that it works very well with room for improvements. Some adjustment can be done at the modelling side, such as increasing the size of training images (currently downsampled to 608×608), increasing the number of detection layers to accommodate a larger range of logo sizes, or perhaps giving the new YOLOv4 a go!
This update concludes the “Product detection in movies and TV shows using machine learning” series. The dataset used for Cadbury, Roses, and Heroes training will be made public for anyone interested in giving it a go or expanding her own logo detector. I am still pushing this topic forward and will start a new series soon!
I have been playing with the project data to study the impact of COVID-19 social distancing / lockdown to the university, especially the use of campus facilities. Meanwhile there are some time series analysis and behavioural modelling that I’d like to complete sooner than later. Everything has taken me so much longer than what I planned. Here are some previews followed by moaning, i.e., the painful processes to generate these.
The above shows some regular patterns of crowd density and how the numbers dropped due to COVID-19 lockdown. Students started to reduce their time on campus in the week prior to the official campus closure.
The autocorrelation grape shows a possible daily pattern (data resampled in 5 minute interval so 288 samples is a day, hence the location of the first peak).
Seasonal decomposition based on the hypothesis of a weekly pattern. There is also a strong hourly pattern, which I’ll explain in the paper (not written yet!).
These ones above show the area crowd density dynamics of one floor of an academic building. The one on the left shows how an academic workspace, a few classrooms and study areas were used during a normal week when few people in the UK felt the COVID-19 is relevant to them. The middle one shows the week when there were increasing reports of COVID-19 cases in the UK and the government was changing its tones and advising social distancing. Staff and students reduced their hours on campus. The one on the right shows a week during university closure (building still accessible for exceptional purposes).
Using the system to monitor real-time crowd data provides a lot of insights but its somehow passive. It’s the modelling, simulation and predictions that make the system truly useful. I have done some work on this and I’ll gradually update this post with some analysis results:
The first thing I tried is standard time-series analysis. A lot of people don’t think it’s a bid deal but it’s tricky to get things right. There are many models to try and they are all based on the assumption that we can predict future data based on previous observations. ARIMA (Auto Regressive Integrated Moving Average) is a common time-series analysis method characterised by 3 terms: p, d, q. Tother they work on which part(s) of the observed data to use and how to adjusted the data (based on how thing change over time) to form a prediction. The seasonal variation of ARIMA (SARIMA) introduces additional seasonal terms to capture seasonal differences. Our campus WIFI data is not only non-stationary but also has multiple seasonality embedded: from a high level, the university has terms, each term has a start and end with special activities, each week of the term has weekdays and weekends, each weekday has lecturing hours and non-lecturing hours. The standard SARIMA can only capture one seasonality but it will be our starting point to experiment with crowd predictions.
Figure above shows the predictions of campus occupant level on Friday 6th March. The blue curve plots the observed data on that week (Monday to Friday). The green curve depicts the “intra-week” predictions based on data observed during the same week, i.e., using Monday-Thursday’s data to guess Friday’s data. This method can respond to extraordinary situations in a particular week. If we chain all weekday data then in theory it’s possible to make prediction for any weekday of the week, practically ignoring the differences across weekdays. However, we know that people’s activities across weekdays are not entirely identical. Students have special activities on Wednesdays and everyone tries to finish early on Fridays. This explains why the intra-week predictions overestimate occupancy level for Friday afternoon. The orange curve gives the “inter-week” prediction based on previous four Fridays. This method captures normal activities on Fridays but is agnostic to week-specific changes (e.g., the week prior to exams). Balancing intra- and inter-week predictions using a simple element-wise Mean, the red curve shows the “combined” prediction. For this particular prediction exercise, the combined method does not show better MSE measurement compared with the inter-week version, partially due to the overestimates.
Figure above shows the week prior to the university’s closure in response to COVID-19. This week is considered an “abnormal” week as students and staff started to spend more time study or work from home. In this case, the intra-week model successfully captures the changes on that week. There must be a better way to balance the two model to take the best from both worlds but I will try other options first.
All modelling above were done using pmdarima, a Python implementation of the R’s auto.arima feature. To speed up the process, the data was subsampled to a 30-minute interval. The number of observations per seasonal cycle m was set as 48 (24 hours x 2 samples per hour) to define a daily cycle.
[more to come soon]
Some technical details…
The main tables on the live DB have 500+ million records (which takes about 300 GB space). It took a week to replicate it on a second DB so I can decouple the messy analysis queries from the main.
A few python scripts to correlate loose data in the DB which got me a 150+ GB CSV file for time series analysis. From there, the lovely Pandas happily chews the CSV like its bamboo shoots.
The crowd density floor map was done for live data (10 minute coverage). To reprogramme it for historical data and generate the animation above, a few things have to be done:
A python script ploughed through the main DB table (yes the one with 500 million records) and derive area density in a 10-minute interval. The script also did a few other things at the same time so the whole thing took a few hours.
Future work will get all these tidied up and automated.
Public image datasets are very handy when it comes to ML training but at some point you’ll face a product/logo that are not covered by any existing dataset. In our case, we are experimenting with detecting Cadbury Roses and Cadbury Heroes products. We need to construct an image dataset to cover these two products.
Instagram – use image acquisition tool such as Instaloader (https://instaloader.github.io) to fetch images based on hashtags (#cadburyheroes and #cadburyroses).
The three sources provide around 5,000 raw images with a significant amount of duplicates and unrelated items. A manual process is needed to filter the dataset. Going through thousands of files is tedious, so to make things slightly easier, I made a small GUI application. When you first start the application, it prompts for your image directory. Then it loads the first image. You then use Left and Right arrow key to decide whether to keep the image for ML training or discard it (LEFT to skip and RIGHT to keep). No files are deleted and instead they are moved to corresponding sub-directories “skip” and “keep”. Once one of the two arrow keys is pressed, the application loads the next image. It’s pretty much a one-hand operation so you have the other hand free to feed yourself coffee/tea… The tool is available on Github. It’s based on wxPython and I’ve only tested it on Mac (pythonw).
Labelling the dataset requires manual input of bounding box coordinates and label. A few tools are available including: LabelImg and Yolo_mark. I also set up “Video labelling tool” as one of the assignment topics for my CS module Media Technology. So hopefully we’ll see some interesting designs for video labelling. In this case we use Yolo_mark as it directly exports in the labelling format required by our framework.
Depending on the actual product and packaging, the logo layout and typeface varies. I am separating out as four classes Cadbury logo, “Heroes” (including the one with the star), “Roses” in precursive (new), and “Roses” in cursive (old) and code them as cadbury_logo, cadbury_heroes, cadbury_roses_precursive, and cadbury_roses_cursive.
Training has been an ongoing process to test what configurations work for our us the best. Normally you set the training config, dataset and validation strategy then sit back and wait for the model performance to peak. Figures below show plots of loss and mAP@0.5 for 4000 iterations of training with input size of 608 and batch size of 32. Loss generally drops as expected and we can get mAP around 90% with careful configurations. The training process saves model weights every 1000 iterations plus the best, last and final version of weights. The training itself takes a few hours so I usually run it overnight.
The performance measures are based on our image dataset. To evaluate how the model actually performs on test videos, it is essential to do manual verification. This means feeding the videos frame by frame to the pre-loaded model then assemble the results as videos. Because YOLO detect objects at 3 scales, the input test image size has a great influence on recall. Our experiments suggest that the input size of 1600 (for full HD videos) leads to the best results. So input HD content are slightly downsampled and padded. The images below show the detections of multiple logos in the test video.
It is clear that training a model on image dataset for video content CAN work, but there are many challenges. Many factors such as brightness, contrast, motion blur, and video compression all impact the outcomes of the detection. Some of the negative impacts can be mitigated by tuning the augmentation (to mimic how things look in motion pictures) and I suspect a lot can be done once we start to exploit the temporal relationship between video frames (instead of considering them as independent images).