A Decade of CVPR

2016 年是我第一次参加 CVPR. 我以 FAIR 实习生的身份注册, 在会场和旷视的伙伴们 Tim, 舒畅汇合, 一起搭起了旷视的展台. 那一年, 旷视是 CVPR 的最高等级赞助商, 展台上展览了我参与的量化 CNN 工作 DoReFa-Net, Scott Gray 还来交流了里面的量化 kernel 要怎么写. DoReFa-Net 是旷视第一次在 FPGA 芯片上跑起 CNN, 这些技术积累成了十年后上市的爱芯元智的雏形. 我们隔壁, 是侯晓迪带着同事们在布置图森的展台.

在那一年的拉斯维加斯, 华人圈子里讨论最多的 gossip 是:"听说孙剑要去旷视了". 孙剑中途来旷视的展台跟我们打了个招呼, 懵懂的我当时还并不了解这个名字的含金量. 那一年, ResNet 毫不意外的获得了最佳论文奖. 在 CVPR 后的一个月, 恺明加入了 FAIR, 在公司安排的公寓里和我做了一段时间邻居. 再往后的几年中, 以恺明、Ross、Piotr 等人为核心的这个叫做 "win vision" 的团队, 是 CV 界的 best team in the world.

Read more

Common Python Reference Cycle Patterns

In Python, when a set of objects constructs a reference cycle, none of them would reach a zero refcount. In this case, even if these objects all go out-of-scope and are no longer accessible, they will not be immediately released.

The Python ecosystem typically accepts reference cycles as an inevitable issue, and relies on garbage collection (GC) to avoid leaks. A GC is triggered by the Python interpreter from time to time; it will detect all non-reachable objects, and release them regardless of their refcount.

However, in high performance deep learning systems, GC is not always a good choice.

Read more

写在 wechat-dump 项目的第十年

在过年的这几天, 为了从焦虑的工作中换一个心情, 我给我的 wechat-dump 项目添加了几个当年没做出来的功能, 解决了一些遗留问题. 意外的发现这个项目始于 2014 年末, 到今天已经超过十年了. 有多少人会有给自己十年前的代码补充新 feature 的经历呢? 突然有了一些感触想要写下来.

Read more

为什么应该使用 Stacked Diffs / Stacked PRs

Meta 与 Google 内部的代码管理工具都支持一种被称作 "stacked diffs / stacked PRs" 的 workflow. 然而, 基于 git 的主流平台 (github, gitlab) 都不支持这种 workflow. 许多离开 Meta 后不得不使用 github 的朋友表示, stacked diffs 对于工程师是一个 "ultimate productivity tool", 我也深有同感. 这篇文章介绍一下什么是 stacked diffs workflow, 以及为什么它能够极大的提升团队开发效率.

Read more

Registration Does Not Scale Well

People have many different opinions about config systems. Having worked with various styles of configs, I also want to write about what a great config subsystem in a large-scale (in terms of system complexity, number of users, etc.) system should look like.

The design space is complex, so in this article I'll start with a smaller topic: registration in config systems. I'll show why this common pattern, though works fine for small-scale projects, does not scale well in the long term. I'll also discuss an alternative.

Read more

Safe Static Initialization, No Destruction

Since I joined Google Brain, I brought PyTorch to Google's internal infra and owned its maintenance. Being a "tech island", it's well known that almost everything in Google works differently from the outside world, and that creates many challenges when building a massive library like PyTorch.

Among those challenges, there are a few tricky bugs related to static initialization order fiasco (SIOF) and their destructions. This time I was forced to learn a lot more details than I'd like to know about these topics, so it's good to write them down before I forget.

Read more

Some Useful Terminal Escape Sequences

最近学习到了一些 Terminal Escape Sequences, 其中尤其对 OSC52 相见恨晚. 这里稍微记录一下各种 Sequences.

Terminal Escape Sequences 是终端应用向 stdout 打出的一些具有特殊含义的字符串. 终端看到这些串之后不会显示它们, 而是执行这些串所对应的终端高级功能.

Read more

Demystify RAM Usage in Multi-Process Data Loaders

A typical PyTorch training program on 8 GPUs with 4 dataloader workers per GPU would create at least processes. A naive use of PyTorch dataset and dataloader can easily replicate your dataset's RAM usage by 40 times. This issue has probably affected everyone who has done anything nontrivial with PyTorch. In this post, we will explain why it happens, and how to avoid the 40x RAM usage.

Read more

Not Every Model Has a Separate "Loss Function"

"Loss function" is one of the most basic concepts today in deep learning. Despite that, it is actually not necessarily a good programming abstraction when designing general-purpose systems. A system should not assume that a model always comes together with a separate "loss function".

Read more

How to Maintain Clean Core APIs for Research

Building a library for research and experiments is quite different from building other types of software. A key challenge is that, in research, abstractions and APIs are rarely set in stone: users may want to propose a slight variant or modification to literally ANYWHERE in the whole program, just because they have a new idea.

Read more