Hao Zhang
Currently, I am leading the development/research of next-generation AI accelerated data systems and the research of knowledge system at Huawei Cloud Database Innovation Lab.
We are actively seeking talented individuals for full-time positions and internships who share our passion for systems research and development. If you are interested, please reach out with your details or resume!
News
- 11/2024 One paper accepted by SIGMOD’24. Congrats to Prof.Sun Shixuan.
- 11/2024 One paper accepted by SIGMOD’24. Congrats to Yongliang Zhang and Prof.Yuanyuan Zhu.
- 09/2024 Built the world’s fastest Graph Database, reaching Top-1 in LDBC Benchmark, and outperforming the previous Top-1 by 3000X! A big congratulation to all my team members and collaborators at Huawei!
- 06/2024 One paper accepted by VLDB’24 Research Track. Congrats to Guanghua Li and my colloborators!
- 06/2024 Join Linked Data Benchmark Council (LDBC) as MPC member on behalf of Huawei Cloud.
- 10/2023 Two paper accepted by ICDE’24 Research Track.
- 05/2023 One paper accepted by The VLDB Journal.
- 10/2022 Join Huawei Cloud Database Innovation Lab!
Biography
I am a Research Scientist at Huawei Cloud Database Innovation Lab, recruited through the prestigious TopMinds program. I obtained my Ph.D. from the Database Group at The Chinese University of Hong Kong in 2022, under the mentorship of Prof. Jeffrey Xu Yu. Prior to that, I obtained my B.E. degree from the HongYi Honor Class at Wuhan University in 2017.
Research Interests
My current research focuses on preparing the data/knowledge system for the AI Era, particularly in the following areas:
- Data System: Design and Optimization of Next-Generation AI accelerated Data Systems for Compound AI Workload
- Building SQL, Cypher (Graph queries), Graph Analytics engine on top of Tensor Computation Runtimes (TCR), e.g. Pytorch, to support diverse compound AI workload.
- Optimizing cross-domain ML pipelines, encompassing SQL, Graph, and ML queries.
- Enhancing query optimization through machine learning-based approaches (AI4DB) to improve processing efficiency and effectiveness.
- Knowledge System: Development of High-Performance Graph Databases (Knowledge Graph) / ANN system for LLM:
- Building highly optimized ANN engine on top of Tensor Computation Runtime (TCR), which lays in the same system as LLM.
- Building world record breaking level graph database to accelerate the serving of graph to LLM.
- Optimizing mixed ANN query that encompassing ANN query and relational query.
My previous research interest includes:
- Distributed query processing system that focous on solving complex relational/graph queries.
- Distributed algorithms for foundamental problems in graph, i.e., subgraph matching.
Systems
Next-gen GES (2022-Present): The Graph Engine Service (GES) is a high-performance, fully managed graph database service developed by Huawei to handle complex graph-based queries and large-scale graph computing tasks. We have developed the next generation of GES, employing advanced techniques to enhance performance by several orders of magnitude.
TCRDS (2022-Current): TCRDS is a unified analytic engine for SQL queries, Subgraph queries, and Graph Analytic Queries, built upon Tensor Computation Runtime (TCR), such as PyTorch. Leveraging a highly optimized and cross-platform TCR backend, TCRDS achieves full-speed operation on all platforms (including Nvidia GPU, AMD GPU, Apple M series SoC, and Huawei Ascend), outperforming traditional purpose-built systems by orders of magnitude.
SeccoSQL (2020-2022): SeccoSQL (Separate communication from computation) is an experimental distributed SQL engine on Spark designed for processing complex SQL/Graph queries. It explicitly decouples Relational Algebra (RA) operators into pure communication and computation operators. SeccoSQL can reorder operators at a finer granularity than existing SQL engines, enabling a greater search space of plans and further reducing communication costs.
DISC (2018-2020): DISC is a specialized graph system on Spark for computing subgraph counts of arbitrary patterns and orbits in a relational manner. Unlike existing subgraph counting approaches that operate directly on graphs, DISC decomposes subgraph counting queries into a sequence of relational queries, enabling efficient execution.
Crystal (2016-2017): Crystal is a novel method for distributed subgraph matching on very large graphs. It differs from existing subgraph matching approaches by computing compressed results of subgraph matching directly, greatly reducing computation costs.
Publications
2025
YongLiang Zhang, Yuanyuan Zhu, Hao Zhang, Congli Gao, Yuyang wang, et al, TGraph: A Tensor-centric Graph Processing Framework, SIGMOD International Conference on Management of Data (SIGMOD), 2025, To Appear.
Jixian Su, Chiyu Hao, Shixuan Sun, Hao Zhang, Sen Gao, et al, Revisiting the Design of In-Memory Dynamic Graph Storage, SIGMOD International Conference on Management of Data (SIGMOD), 2025, To Appear.
2024
Guanghua Li, Hao Zhang, Xibo Sun, Qiong Luo, Yuanyuan Zhu. TenGraph: A Tensor-Based Graph Query Engine International Conference on Very Large Database (VLDB), 2024, To Appear.
Yishu Wang, Jinlong Chu, Ye Yuan, Yu Gu, Hangxu Ji, Hao Zhang. Label Constrained Reacability Queries on Time Dependent Graphs. IEEE International Conference on Data Engineering (ICDE), 2024.
Anbiao Wu, Ye Yuan, Changsheng Li, Yuliang Ma, Hao Zhang, Attributed Network Embedding in Streaming Style . IEEE International Conference on Data Engineering (ICDE), 2024.
2023
- Kangfei Zhao, Jeffrey Xu Yu,, Qiyan Li, Hao Zhang, Yu Rong. Learned sketch for subgraph counting: a holistic approach. The VLDB Journal 32 (5), 937-962, 2023
2022
Hao Zhang, Jeffrey Xu Yu, Yikai Zhang, Kangfei Zhao. Parallel Query Processing: To Separate Communication from Computation. ACM SIGMOD/PODS International Conference on Managerment of Data (SIGMOD), 2022.
Hao Zhang, Qiyan, Li, Kangfei Zhao, Jeffrey Xu Yu, Yuanyuan Zhu. How Learning Can Help Complex Cyclic Join Decomposition (Demo). IEEE International Conference on Data Engineering (ICDE), 2022.
Kangfei Zhao, Jeffrey Xu Yu, Zongyan He, Rui Li, Hao Zhang. Lightweight and Accurate Cardinality Estimation by Neural Network Gaussian Process. ACM SIGMOD/PODS International Conference on Managerment of Data (SIGMOD), 2022.
2021
Kangfei Zhao, Jeffrey Xu Yu, Hao Zhang, Qiyan Li, Yu Rong, A Learned Sketch for Subgraph Counting. ACM SIGMOD/PODS International Conference on Managerment of Data (SIGMOD), 2021.
Hao Zhang, Miao Qiao, Jeffrey Xu Yu, Hong Cheng. Fast Distributed Complex Join Processing (Short). IEEE International Conference on Data Engineering (ICDE), 2021.
2020
- Hao Zhang, Jeffrey Xu Yu, Yikai Zhang, Kangfei Zhao, Hong Cheng. Distributed Subgraph Counting: A General Approach. International Conference on Very Large Data Bases (VLDB), 2020.
2019
- Kangfei Zhao, Jiao Su, Jeffrey Xu Yu, Hao Zhang. SQL-G: Efficient Graph Analytics by SQL. IEEE Transactions on Knowledge and Data Engineering (TKDE), 2019.
2018
- Miao Qiao, Hao Zhang, Hong Cheng. Subgraph matching: on compression and computation. International Conference on Very Large Data Bases (VLDB), 2018.
2017
Yuanyuan Zhu, Hao Zhang, Lu Qin, Hong Cheng. Efficient MapReduce algorithms for triangle listing in billion-scale graphs. Distributed And Parallel Database (DPD), 2017.
Hao Zhang, Yuanyuan Zhu, Lu Qin, Hong Cheng, Jeffrey Xu Yu. Efficient Local Clustering Coefficient Estimation in Massive Graphs. Database Systems for Advanced Applications (DASFAA), 2017
2016
- Hao Zhang, Yuanyuan Zhu, Lu Qin, Hong Cheng, Jeffrey Xu Yu. Efficient triangle listing for billion-scale graphs. IEEE International Conference on Big Data (BigData), 2016
Hornor & Awards
- First Class, TopMinds Program in Huawei, 2022
- Meritorious Winner, COMAP’s Mathematical Contest in Modeling, 2016
- Second Class, Hongyi Scholarship, 2015
- Second Class, HuaZhong Area Mathematical Modelling, 2015
Professional Activities
- Reviewer: TKDD, PAKDD’21,22,23,24 , KDD’20, CIKM’20,21, AAAI’21
- External Reviewer: SIGMOD’21,22, VLDB’19,20,21,22, ICDE’19,20,21,22
- External Orgniazation: LDBC MPC
[updated on 2024/11/01]