ICPP 2023

International Conference on Parallel Processing

Program at a Glance

This program is subject to change and authors should plan to attend the entire conference as the time of their presentation could be altered in such changes.

 

Monday, August 7, 2023

8:00 AM

Registration Open

Please come to the Warnock Engineering Building 3rd Floor reception area to check in.

  Warnock Eng Bldg 2460 Warnock Eng Bldg 2470 Warnock Eng Bldg 3780 Warnock Eng Bldg 2760
8:30 AM Workshop 1: DUAC Workshop 2: PDADS Workshop 3: AWASN Workshop 4: LLPP
10:00 AM Break - 3rd Floor Warnock
  Workshop 1: DUAC Continued Workshop 2: PDADS Continued Workshop 3: AWASN Continued Workshop 4: LLPP Continued
12:00 PM Lunch - 3rd Floor Warnock
1:30 PM Workshop 5: EMS Workshop 6: SANDY Workshop 7: P2S2  
3:00 PM Break - 3rd Floor Warnock
  Workshop 5: EMS Continued Workshop 6: SANDY Continued Workshop 7: P2S2 Continued  
5:00 PM Workshops End; Dinner on your own

Tuesday, August 8, 2023

7:30 AM

Registration Open - Foyer

Continental Breakfast

  Ballroom A Ballroom B Ballroom C
8:30 AM

Welcome and Introduction,

-------

2023 Best Paper and Test of Time Award Announced

-------

Keynote:

Torsten Hoefler
Scalable and Efficient AI: From Supercomputers to Smartphones

Chair: David Abramson, University of Queensland

10:00 AM Coffee Break - Foyer
10:30 AM

Plenary

ICPP Time Machine: 2022 Best Paper Presentations

BWA-MEM-SCALE: Accelerating Genome Sequence Mapping on Commodity Servers. Changdae Kim, Kwangwon Koh, Taehoon Kim, Daegyu Han and Jiwon Seo

Online Scheduling of Moldable Task Graphs under Common Speedup Models. Anne Benoit, Lucas Perotin, Yves Robert and Hongyang Sun

Chairs: David Abramson, University of Queensland
Bronis R. de Supinski, Lawrence Livermore National Laboratory

12:00 PM Lunch
  Career Mentoring Panel in the Beehive Room 
1:30 PM

Numerics (In Person)

O(N) distributed direct factorization of structured dense matrices using runtime systems. Sameer Deshmukh, Rio Yokota, George Bosilca, Qinxiang Ma 

Computing the k-th Eigenvalue of Symmetric H2-Matrices. M. Ridwan Apriansyah, Rio Yokota

EC-SpMM: Efficient Compilation of SpMM Kernel on GPUs. Junqing Lin, Honghe Zhang, Xiaolong Shi, Jingwei Sun, Xianzhi Yu, Jun Yao, Guangzhong Sun

Chair: Srinivas Aluru, Georgia Institute of Technology,
School of Computational Science and Engineering

 

Compression and Encoding (In Person)

Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability. Fangzheng Lin, Kasidis Arunruangsirilert, Heming Sun, Jiro Katto 

Minimizing Network and Storage Costs for Consensus with Flexible Erasure Coding. Mi Zhang, Qihan Kang, Patrick P. C. Lee

SNICIT: Accelerating Sparse Neural Network Inference via Compression at Inference Time on GPU. Shui Jiang, Tsung-Wei Huang, Bei Yu, Tsung-Yi Ho

Chair: William Godoy, Oak Ridge National Laboratory

 

AI/ML Performance (Remote Session)

DiffLex: A High-Performance, Memory-Efficient and NUMA-Aware Learned Index using Differentiated Management. Lixiao Cui, Kedi Yang, Yusen Li, Gang Wang, Xiaoguang Liu 

BIRP: Batch-aware Inference Workload Redistribution and Parallel Scheme for Edge Collaboration. Hesheng Sun, Xinyi Chen, Zhuzhong Qian, Zengji Li, Ning Chen, Tuo Cao, Suwei Xu, Yitong Zhou

PSRA-HGADMM: A Communication Efficient Distributed ADMM Algorithm. Yongwen Qiu, Yongmei Lei, Guozheng Wang

CoTrain: Efficient Scheduling for Large-Model Training upon GPU and CPU in Parallel. Zhenxing Li, Qiang Cao, Yajie Chen, Wenrui Yan

OSP: Boosting Distributed Model Training with 2-stage Synchronization. Zixuan Chen, Lei Shi, Xuandong Liu, Jiahui Li, Sen Liu, Yang Xu

ITIF: Integrated Transformers Inference Framework for Multiple Tenants on GPU. Yuning Zhang, Zao Zhang, Wei Bao, Dong Yuan

Chair: David Abramson, University of Queensland, Australia

3:00 PM Coffee Break
3:30 - 5:00 PM

Graph Algorithms (In Person)

 Parallel Order-Based Core Maintenance in Dynamic Graphs. BIN GUO, Emil Sekerinski

Fast Parallel Index Construction for Efficient K-truss-based Local Community Detection in Large Graphs. Md Abdul Motaleb Faysal, Maximilian Bremer, Cy Chan, John Shalf, Shaikh Arifuzzaman

BEEP: Balanced Efficient subgraph Enumeration in Parallel. Samiran Kawtikwar, Mohammad Almasri, Wen-mei Hwu, Rakesh Nagi, Jinjun Xiong

Chair: Veronika Sonigo, FEMTO-ST Institue

3:30 - 5:00 PM

Programming Models (In Person)

Improving the Scaling of an Asynchronous Many-Task Runtime with a Lightweight Communication Engine. Omri Mor, George Bosilca, Marc Snir 

Investigating Dependency Graph Discovery Impact on Task-based MPI+OpenMP Applications Performances. Romain PEREIRA, Adrien ROUSSEL, Patrick CARRIBAULT, Thierry GAUTIER

Implementing OpenMP's SIMD Directive in LLVM’s GPU Runtime. Eric Wright, Johannes Doerfert, Shilei Tian, Barbara Chapman, Sunita Chandrasekaran

Chair: Ignacio Laguna, Lawrence Livermore National Laboratory

3:30 - 4:15 PM

Applications (Remote Session)

Smart Cache Insertion and Promotion Policy for Content Delivery Networks. Peng Wang, Yu Liu, Zhelong Zhao, Ke Zhou, Zhihai Huang, Yanxiong Chen 

BlockPilot: A Proposer-Validator Parallel Execution Framework for Blockchain. Haowen Zhang, Jing Li, He Zhao, Tong Zhou, Nianzu Sheng, Hengyu Pan

Communication Optimizations for State-vector Quantum Simulator on CPU+GPU Clusters. Chenyang Jiao, Weihua Zhang, Li Shen

Chair: Dan Reed, University of Utah

4:15 - 5:00 PM

LMS-Tree Research (Remote Session)

RBC: A bandwidth controller to reduce write-stalls and tail latency. zepeng wang, shu yin, Yanjie Song 

PMLDS: An LSM-tree Direct Managed Storage for Key-value Stores on Byte-addressable Devices. Ziyi Lu, Qiang Cao, Shucheng Wang, Jie Yao, Xiangrui Yang

DComp: Efficient Offload of LSM-tree Compaction with Data Processing Units. Chen Ding, Jian Zhou, Jiguang Wan, Yiqin Xiong, Sicen Li, Shuning Chen, Hanyang Liu, Liu Tang, Ling Zhan, Kai Lu, Peng Xu

Chair: Dan Reed, University of Utah

5:00 - 5:15 PM

Applications (Remote Session, Part II)

RadarSSD: A Computational Storage for Radar Signal Processing. Jiali Li, Xianzhang Chen, Duo Liu, Ao Ren, Zhaoyang Zeng, Yujuan Tan

Chair: Dan Reed, University of Utah

6:30 PM - 9:00 PM Banquet at the Natural History Museum of Utah

Wednesday, August 9, 2023

8:00 AM

Registration Open - Foyer

Continental Breakfast

8:30 AM

Keynote:

Katherine A. Yelick
Beyond Exascale Computing

Chair: Dan Reed, University of Utah

10:00 AM Coffee Break
10:30 AM

Training (In Person)

Communication-Efficient Generalized Neuron Matching for Federated Learning. Sixu Hu, Qinbin Li, Bingsheng He

Group-based Hierarchical Federated Learning: Convergence, Group Formation, and Sampling. Jiyao Liu, Xinliang Wei, Xuanzhang Liu, Hongchang Gao, Yu Wang

FastDimeNet++: Training DimeNet++ in 22 Minutes. Feiwen Zhu, Michal Futrega, Han Bao, Sukru Burc Eryilmaz, Fei Kong, Kefeng Duan, Xinnian Zheng, Nimrod Angel, Matthias Jouanneaux, Maximilian Stadler, Fung Xie, June Yang, Michael Andersch, Michal Marcinkiewicz

Chair: Konstantinos Parasyris, Lawrence Livermore National Laboratory

 

Communication (In Person)

Quantifying the Performance Benefits of Partitioned Communication in MPI. Thomas Gillis, Ken Raffenetti, Hui Zhou, Yanfei Guo, Rajeev Thakur

Impact of Cache Coherence on the Performance of Shared-Memory based MPI Primitives: A Case Study for Broadcast on Intel Xeon Scalable Processors. George Katevenis, Manolis Ploumidis, Manolis Marazakis

Modeling and Benchmarking the Potential Benefit of Early-Bird Transmission in Fine-Grained Communication. Whit Schonbein, Scott Levy, Matthew Dosanjh, William Marts, Elizabeth Reid, Ryan Grant

Chair: Johannes Doerfert, Lawrence Livermore National Laboratory

 

System Software (Remote Session)

CoTuner: A Hierarchical Learning Framework for Coordinately Optimizing Resource Partitioning and Parameter Tuning. Tiannuo Yang, Ruobing Chen, Yusen Li, Xiaoguang Liu, Gang Wang

DeepPower: Deep Reinforcement Learning based Power Management for Latency Critical Applications in Multi-core Systems. Jingrun Zhang, Guangba Yu, Zilong He, Liang Ai, Pengfei Chen

AsyncGBP: Unleashing the Potential of Heterogeneous Computing for SSL/TLS with GPU-based Provider. Yi Bian, Fangyu Zheng, Yuewu Wang, Lingguang Lei, Yuan Ma, Jiankuo Dong, Jiwu Jing

MARS: Fault Localization in Programmable Networking Systems with Low-cost In-Band Network Telemetry. BenRan Wang, Hongyang Chen, PengFei Chen, ZiLong He, GuangBa Yu

On Optimizing Traffic Scheduling for Containerized Microservices. Xianzhi Zhu, Yongkun Li, Lulu Yao, Zhihao Qi, Yinlong Xu, Pengcheng Wang, Weiguang Wang, Xia Zhu

HighRPM: Combining Integrated Measurement and Software Power Modeling for High-Resolution Power Monitoring. Xinxin Qi, Juan Chen, Yong Dong, Yuan Yuan, Tao Xu, Rongyu Deng, Zekai Li, Kexing Zhou, Zheng Wang

Chair: David Abramson, University of Queensland, Australia

12:00 PM Lunch
1:30 - 3:00 PM

Applications (In Person)

Communication-Avoiding Optimizations for Large-Scale Unstructured-Mesh Applications with OP2. Suneth Ekanayake, István Reguly, Fabio Luporini, Gihan Mudalige

WFAsic: A High-Performance ASIC Accelerator for DNA Sequence Alignment on a RISC-V SoC. Abbas Haghi, Lluc Alvarez, Jordi Front, Juan Miguel de Haro Ruiz, Roger Figueras, Max Doblas, Santiago Marco-Sola, Miquel Moreto

PFDRL: Personalized Federated Deep Reinforcement Learning for Residential Energy Management. Jiechao Gao, Wenpeng Wang, Fateme Nikseresht, Viswajith Govinda Rajan, Bradford Campbell

Chair: J. Nelson Amara, University of Alberta

1:30 - 3:00 PM

Resource Scheduling and Adaptation (In Person)

Mercury: Fast and Optimal Device Placement for Large Deep Learning Models. Hengwei Xu, Pengyuan Zhou, Haiyong Xie, Yong Liao

Embracing Uncertainty for Equity in Resource Allocation in ML Training. Suraiya Tairin, Haiying Shen, Zeyu Zhang

Performance-Aware Energy-Efficient GPU Frequency Selection using DNN-based Models. Ghazanfar Ali, Mert Side, Sridutt Bhalachandra, Nicholas J. Wright, Yong Chen

Chair: Robert W. Wisniewski, Samsung

1:30 - 2:15 PM

Federated Learning (Remote Session)

ASFL: Adaptive Semi-asynchronous Federated Learning for Balancing Model Accuracy and Total Latency in Mobile Edge Networks. Jieling Yu, Ruiting Zhou, Chen Chen, Bo Li, Fang Dong

Credit-based Differential Privacy Stochastic Model Aggregation Algorithm for Robust Federated Learning via Blockchain. Mengyao Du, Miao Zhang, Lin Liu, Kai Xu, Quanjun Yin

Learning From Your Neighbours: Mobility-Driven Device-Edge-Cloud Federated Learning. Songli Zhang, Zhenzhe Zheng, Fan Wu, Bingshuai Li, Yunfeng Shao, Guihai Chen

Chair: Wu-Chun Feng, Virginia Tech

2:15 - 3:00 PM

Graph and Data Analytics (Remote Session)

DAG-Aware Optimization for Geo-Distributed Data Analytics. Qingyuan Wang, Bin Gao, Zhi Zhou, Fei Xu, Chenghao Ouyang

Connectivity-Aware Link Analysis for Skewed Graphs. YuAng Chen, Yeh-Ching Chung

BitColor: Accelerating Large-Scale Graph Coloring on FPGA with Parallel Bit-Wise Engines. Haishuang Fan, Ming Li, Jingya Wu, Wenyan Lu, Xiaowei Li, Guihai Yan

Chair: Wu-Chun Feng, Virginia Tech

3:00 PM Coffee Break
3:30 - 5:00 PM

Graph-Related Techniques (In Person)

Fast tree-based algorithms for DBSCAN for low-dimensional data on GPUs. Andrey Prokopenko, Damien Lebrun-Grandie, Daniel Arndt

GFFT: a Task Graph Based Fast Fourier Transform Optimization Framework. Qinglin Lu, Xinyu Wang, Wenjing Ma, Yuwen Zhao, Daokun Chen, Fangfang Liu

ADARNet: Deep Learning Predicts Adaptive Mesh Refinement. Octavi Obiols-Sales, Abhinav Vishnu, Aparna Chandramowlishwaran, Nicholas Malaya

Chair: Tsung-Wei Huang, University of Utah

3:30 - 5:00 PM

Memory and Storage (In Person)

Hector: A Framework to Design and Evaluate Scheduling Strategies in Persistent Key-Value Stores. Louis-Claude Canon, Anthony Dugois, Loris Marchal, Etienne Rivière

Warped-MC: An Efficient Memory Controller Scheme for Massively Parallel Processors. Jong Hyun Jeong, Myung Kuk Yoon, Yunho Oh, Gunjae Koo

Chair: Michael Gerndt, Technical University of Munich

3:30 - 4:15 PM

Networks (Remote Session)

WRHT: Efficient All-reduce for Distributed DNN Training in Optical Interconnect Systems. Fei Dai, Yawen Chen, Zhiyi Huang, Haibo Zhang

SEECHIP: A Scalable and Energy-Efficient Chiplet-based GPU Architecture Using Photonic Links. Hao Zhang, Yawen Chen, Zhiyi Huang, Haibo Zhang, Fei Dai

RLB: Reordering-Robust Load Balancing in Lossless Datacenter Networks. Jinbin Hu, Yi He, Jin Wang, Wangqing Luo, Jiawei Huang

Chair: Martin Berzins, University of Utah

4:15 - 5:15 PM

Scheduling (Remote Session)

Scheduling Dependent Batching Tasks. Hehuan Shi, Lin Chen, Ming Lin, Raphael Phan

Tango: Harmonious Management and Scheduling for Mixed Services Co-located among Distributed Edge-Clouds. Yicheng Feng, Shihao Shen, Mengwei Xu, Yuanming Ren, Xiaofei Wang, Victor C.M. Leung, Wenyu Wang

SPLIT: QoS-Aware Inference Resource Allocator via Evenly-sized Model Splitting. Diaohan Luo, Tian Yu, Yuewen Wu, Heng Wu, Tao Wang, Wenbo Zhang

NeiLatS: Neighbor-Aware Latency-Sensitive Application Scheduling in Heterogeneous Cloud-Edge Environment. Huadong Li, Hui Liu, Changyuan Liu, Aoqi Chen, Zhaocheng Niu, Junzhao Du

Chair: Martin Berzins, University of Utah

5:00 PM Break
6:00 PM - 9:00 PM Reception/Poster Session

Thursday, August 10, 2023

8:00 AM

Registration Open - Foyer

Continental Breakfast

8:30 AM

Keynote:

Robert W. Wisniewski
Attacking the Memory and Communication Wall with Memory Coupled Compute

Chair: Bronis R. de Supinski, Lawrence Livermore National Laboratory

10:00 AM Coffee Break
10:30 PM

Plenary

Daniel Reed: The Future of HPC

David Abramson: Translational Computer Science

Chair: Manish Parashar, University of Utah

12:00 PM Lunch
1:30 PM

Inference (In Person)

Dystri: A Dynamic Inference based Distributed DNN Service Framework on Edge. Xueyu Hou, Yongjie Guan, Tao Han

FaST-GShare: Enabling Efficient Spatio-Temporal GPU Sharing in Serverless Computing for Deep Learning Inference. Jianfeng Gu, Yichao Zhu, Puxuan Wang, Mohak Chadha, Michael Gerndt

Output-Directed Dynamic Quantization for DNN Acceleration. Beilei Jiang, Xianwei Cheng, Yuan Li, Song Fu, Qing Yang, Mingxiong Liu, Alejandro Olvera

Chair:

 

Compilation and Checkpointing Techniques (In Person)

ORAQL — Optimistic Responses to Alias Queries in LLVM. Jan Hueckelheim, Johannes Doerfert

Scalable Checkpointing of Applications with Sparsely Updated Data. Nigel Tan, Bogdan Nicolae, Jakob Luettgau, Jack Marquez, Keita Teranishi, Nicolas Morales, Sanjukta Bhowmick, Michela Taufer, Franck Cappello

General-purpose Asynchronous Periodic Checkpointing in Hybrid Memory. Masaki Nakata, Shigeyuki Sato, Tomoharu Ugawa

Chair: Christian Engelmann, Oak Ridge National Laboratory

 

Memory and Storage (Remote Session)

Conflux: Exploiting Persistent Memory and RDMA Bandwidth via Adaptive I/O Mode Selection. Zhenlin Qi, Shengan Zheng, Yifeng Hui, Bowen Zhang, Linpeng Huang

Marlin: A Concurrent and Write-Optimized B+-tree Index on Disaggregated Memory. Hang An, Fang Wang, Dan Feng, Xiaomin Zou, Zefeng Liu, Jianshun Zhang

GPU Performance Acceleration via Intra-Group Sharing TLB. Weiming Huang, Yajuan Du, Mingyang Liu

DArray: A High Performance RDMA-Based Distributed Array.  Baorong Ding, Mingcong Han, Rong Chen

Toward Optimal Repair and Load Balance in Locally Repairable Codes. Hao Zhao, Si Wu, Haifeng Liu, Zhixiang Tang, Xiaochun He, Yinlong Xu

Re-aligning Across-page Requests for Flash-based Solid-state Drives. Zhigang Cai, Chengyong Tang, Minjun Li, Jun Li, Zhibing Sha, François Trahay, Jiaojiao Wu, Fan Yang, Jianwei Liao

Chair: Manish Parashar, University of Utah

3:00 PM Coffee Break
3:30 PM

Optimization of AI/ML (In Person)

DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification. Daegun Yoon, Sangyoon Oh

Composable Workflow for Accelerating Neural Architecture Search Using In Situ Analytics for Protein Characterization. Georgia Channing, Ria Patel, Paula Olaya, Ariel Rorabaugh, Osamu Miyashita, Silvina Caino-Lores, Catherine Schuman, Florence Tama, Michela Taufer

Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. Shenggui Li, Hongxin Liu, Zhengda Bian, Jiarui Fang, Haichen Huang, Yuliang Liu, Boxiang Wang, Yang You

Chair: Bogdan Nicolae, Argonne National Laboratory

 

Numerics (Remote Session)

JSweep: A Patch-centric Data-driven Approach for Parallel Sweeps on Large-scale Meshes. Jie Yan, Zhang Yang, Aiqing Zhang, Zeyao Mo

Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs. Mingzhen Li, Hailong Yang, Shanjun Zhang, Fengwei Yu, Ruihao Gong, Yi Liu, Zhongzhi Luan, Depei Qian

Accelerating Large-Scale CFD Simulations with Lattice Boltzmann Method on a 40-Million-Core Sunway Supercomputer. Zhao Liu, Xuesen Chu, Xiaojing Lv, Hanyue Liu, Haohuan Fu, Guangwen Yang

HASpGEMM: Heterogeneity-Aware Sparse General Matrix-Matrix Multiplication on Modern Asymmetric Multicore Processors. Helin Cheng, Wenxuan Li, Yuechen Lu, Weifeng Liu

An Improved Parallel Overset Grid method for fluid simulation with moving boundary. Ran Zhao, Chao Li, Xiaowei Guo, Yi Liu, Sifan Long, Sen Zhang, Yanlong Qiu, Canqun Yang

JOSS: Joint Exploration of CPU-Memory DVFS and Task Scheduling for Energy Efficiency. Jing Chen, Madhavan Manivannan, Bhavishya Goel, Miquel Pericàs

Chair: Dan Reed, University of Utah

5:00 PM Conference Ends