Hi! This is Chen, a computer science Ph.D. candidate from Pervasive Computing Group at Tsinghua University, supervised by Prof. Yuanchun Shi and Prof. Chun Yu. I received my Bachelor's degree in Computer Science from Tsinghua University in 2019. My major research direction is Human-Computer Interaction (HCI) and I am also keeping track of research updates on natural language processing and computer vision. My research interest focuses on facilitating natural and efficient interaction schemes with compact sensor form in XR and mobile scenarios by leveraging multi-modality sensing (e.g., vision, audio, inertial signal, and RF). My previous and ongoing work expanded the user's input capability both in spatial (e.g., enabling precise input on the subtle fingertip unintrusively) and temporal (e.g., enhancing the recognition of fast and transient gestures) domains. My goal is to develop fundamental interaction techniques, with which a user can interact with everything around them seamlessly, for the next generation XR interface (just like the mouse and the keyboard for GUI). Recent News: 1. Great pleasure to give a talk at HKUST(GZ). Thank Prof. Mingming Fan for the invitation. 2. One paper have been conditionally accepted by IMWUT (May 2023). Congratulations to my collaborators! 3. One paper has been conditionally accepted by UIST 2023 (pending minor revision). Congratulations to my collaborators! 4. Our paper "Enabling Voice-Accompanying Hand-to-Face Gesture Recognition with Cross-Device Sensing" has received the Honorable Mention Award of CHI 2023! Congratulations to Zisu! |
UIST 2023
We present ShadowTouch, a novel sensing method to recognize the subtle hand-to-surface touch state for independent fingers based on optical auxiliary. ShadowTouch mounts a forward-facing light source on the user's wrist to construct shadows on the surface in front of the fingers when the corresponding fingers are close to the surface. With such an optical design, the subtle vertical movements of near-surface fingers are magnified and turned to shadow features cast on the surface, which are recognizable for computer vision algorithms. To efficiently recognize the touch state of each finger, we devised a two-stage CNN-based algorithm that first extracted all the fingertip regions from each frame and then classified the touch state of each region from the cropped consecutive frames. Evaluations showed our touch state detection algorithm achieved a recognition accuracy of 99.1% and an F-1 score of 96.8% in the leave-one-out cross-user evaluation setting. We further outlined the hand-to-surface interaction space enabled by ShadowTouch's sensing capability from the aspects of touch-based interaction, stroke-based interaction, and out-of-surface information and developed four application prototypes to showcase ShadowTouch's interaction potential. The usability evaluation study showed the advantages of ShadowTouch over threshold-based techniques in aspects of lower mental demand, lower effort, lower frustration, more willing to use, easier to use, better integrity, and higher confidence.
|
IMWUT 2023
Storytelling with family photos, as an important mode of reminiscence-based activities, can be instrumental in promoting intergenerational communication between grandparents and grandchildren by strengthening generation bonds and shared family values. Motivated by challenges that existing technology approaches encountered for improving intergenerational storytelling (e.g., the need to hold the tablet, the potential view detachment from the physical world in Virtual Reality (VR)), we sought to find new ways of using Augmented Reality (AR) to support intergenerational storytelling, which offers new capabilities (e.g., 3D models, new interactivity) to enhance the expression for the storyteller. We conducted a two-part exploratory study, where pairs of grandparents and grandchildren 1) participated in an in-person storytelling activity with a semi-structured interview 2) and then a participatory design session with AR technology probes that we designed to inspire their exploration. Our findings revealed insights into the possible ways of intergenerational storytelling, the feasibility and usages of AR in facilitating it, and the key design implications for leveraging AR in intergenerational storytelling.
|
IMWUT 2023
Programming a smart home is an iterative process in which users configure and test the automation during the in-situ experience with IoT space. However, current end-user programming mechanisms are primarily preset configurations on GUI and fail to leverage in-situ behaviors and context. This paper proposed in-situ programming (ISP) as a novel programming paradigm for AIoT automation that extensively leverages users' natural in-situ interaction with the smart environment. We built a Wizard-of-Oz system and conducted a user-enactment study to explore users' behavior models in this paradigm. We identified a dynamic programming flow in which participants iteratively configure and confirm through query, control, edit, and test. We especially identified a novel method ``snapshot'' for automation configuration and a novel method ``simulation'' for automation testing, in which participants leverage ambient responses and in-situ interaction. Based on our findings, we proposed design spaces on dynamic programming flow, coherency and clarity of interface, and state and scene management to build an ideal in-situ programming experience.
|
IMWUT 2023
Mid-air text entry on virtual keyboards suffers from the lack of tactile feedback, which brings challenges to both tap detection and input prediction. In this paper, we explored the feasibility of single-finger typing on virtual QWERTY keyboards in mid-air. We first conducted a study to examine users’ 3D typing behavior on different sizes of virtual keyboards. Results showed that the participants perceived the vertical projection of the lowest point on the keyboard during a tap as the target location and inferring taps based on the intersection between the finger and the keyboard was not applicable. Aiming at this challenge, we derived a novel input prediction algorithm that took the uncertainty in tap detection into a calculation as probability, and performed probabilistic decoding that could tolerate false detection ...
|
CHI 2023
Gestures performed accompanying the voice are essential for voice interaction to convey complementary semantics for interaction purposes such as wake-up state and input modality. In this paper, we investigated voice-accompanying hand-to-face (VAHF) gestures for voice interaction. We targeted hand-to-face gestures because such gestures relate closely with speech and yield significant acoustic features (e.g., impeding voice propagation). We conducted a user study to explore the design space of VAHF gestures, where we gathered candidate gestures and then applied a structural analysis to them in different dimensions (e.g., contact position and type), outputting a total of 8 VAHF gestures with good usability and least confusion. To facilitate VAHF gesture recognition, we proposed a novel cross-device sensing method that leverages heterogeneous data channels (vocal, ultrasound, and IMU) from commodity devices.
|
CHI 2023
Perceiving the region of interest (ROI) and target object by smartphones from the user's first-person perspective can enable diverse spatial interactions. In this paper, we propose a novel ROI input method and a target selecting method for smartphones by utilizing the user-perspective phone occlusion. This concept of turning the phone into real-world physical cursor benefits from the proprioception, gets rid of the constraint of camera preview, and allows users to rapidly and accurately select the target object ...
|
IMWUT 2022
We present DRG-Keyboard, a gesture keyboard enabled by dual IMU rings, allowing the user to swipe the thumb on the index fingertip to perform word gesture typing as if typing on a miniature QWERTY keyboard. With dual IMUs attached to the user’s thumb and index finger, DRG-Keyboard can 1) measure the relative attitude while mapping it to the 2D fingertip coordinates and 2) detect the thumb’s touch-down and touch-up events combining the relative attitude data and the synchronous frequency domain data, based on which a fingertip gesture keyboard can be implemented ...
|
IMWUT 2021
We present DualRing, a novel ring-form input device that can capture the state and movement of the user's hand and fingers. With two IMU rings attached to the user's thumb and index finger, DualRing can sense not only the absolute hand gesture relative to the ground but also the relative pose and movement among hand segments. To enable natural thumb-to-finger interaction, we develop a high-frequency AC circuit for on-body contact detection ...
|
CHI 2021
We propose Auth+Track, a novel authentication model that reduces redundant authentication in everyday smartphone usage. By sparse authentication and continuous tracking of the user’s status, Auth+Track eliminates the “gap” authentication between fragmented sessions and enables “Authentication Free when User is Around”. To instantiate the Auth+Track model, we present PanoTrack, a prototype that integrates body and near field hand information for user tracking. We install a fisheye camera on the top of the phone to achieve a panoramic vision that can capture both user’s body and on-screen hands ...
|
CHI 2019
We present HandSee, a novel sensing technique that can capture the state of the user’s hands touching or gripping a smartphone. We place a prism mirror on the front camera to achieve a stereo vision of the scene above the touchscreen surface. HandSee enables a variety of novel interaction techniques and expands the design space for full hand interaction on smartphones...
|
AAAI 2019
We propose DeepChannel, a robust, data-efficient, and interpretable neural model for extractive document summarization. Given any document-summary pair, we estimate a salience score, which is modeled using an attention-based deep neural network, to represent the salience degree of the summary for yielding the document. We devise a contrastive training strategy to learn the salience estimation network, and then use the learned salience score as a guide and iteratively extract the most salient sentences from the document as our generated summary ...
|
VR 2022 Workshop
Mid-air text entry on virtual keyboards suffers from the lack of tactile feedback, bringing challenges to both tap detection and input prediction. In this poster, we demonstrated the feasibility of efficient single-finger typing in mid-air through probabilistic touch modeling. We first collected users' typing data on different sizes of virtual keyboards. Based on analyzing the data, we derived an input prediction algorithm that incorporated probabilistic touch detection and elastic probabilistic decoding. In the evaluation study where the participants performed real text entry tasks with this technique, they reached a pick-up single-finger typing speed of 24.0 WPM with 2.8% word-level error rate.
|
Tsinghua University | 09/2019 - present |
Ph.D. student in Computer Science and Technology |
Tsinghua University | 09/2015 - 06/2019 |
B.Eng. in Computer Science and Technology |
National University of Singapore | 07/2018 - 09/2018 |
Research Intern |
CHI Honorable Mention Award (Top 5%) | 2023 |
84 Innovation and Future Scholarship (Top 6), Tsinghua University | 2021 |
Excellent Comprehensive Scholarship, Tsinghua University | 2021, 2022 |
The Second Prize of the 39th "Challenge Cup", Tsinghua University | 2021 |
The First Prize of the 37th "Challenge Cup", Tsinghua University | 2019 |
Excellent Graduate of Computer Science, Tsinghua University | 2019 |
Reviewer, CHI 2020, TURC 2020, CHI LBW 2021, ACL 2021, CHI 2022, TIOT 2022 | 2019 - |
Teaching Assistant (TA), Fundamentals of Computer Programming | 2019 Fall, 2020 Fall, 2021 Fall, 2022 Fall |
Teaching Assistant (TA), Essentials to Signal Processing and Data Management for AIoT Applications | 2022 Fall |
Teaching Assistant (TA), Financial Big Data and Quantitative Analysis | 2021 Spring |
Teaching Assistant (TA), Calculus A(2) | 2020 Spring |
Towards Ubiquitous and Intelligent Hand Interaction | 07/2023 |
HKUST(GZ), Host: Prof. Mingming Fan |
Programming Language: C/C++, Python, Java, JavaScript, CUDA, R, Matlab, VHDL, Assembly Language |
Professional Skills: Machine Learning (CV, NLP, and RL tasks in Pytorch and TensorFlow), Digital Image Processing, Audio Processing, Embedded System (e.g., Arduino, NRF BLE) |
Updated in Jun. 2023