为什么Uber宣布从Postgres切换到MySQL?


为什么Uber宣布从Postgres切换到MySQL?

2016-07-29 Evan Klitzke 高可用架构

导读:近期 Uber 宣布将数据库从 Postgres 迁移到 MySQL,在多个技术社区中引起了轩然大波,通过本文我们来详细了解 Uber 做出以上决策背后的原因。

介绍

Uber 的早期架构是由 Python 编写一个单体后端应用程序,使用 Postgres 作为数据持久化。后来 Uber 架构经历一系列显著改变,朝着微服务架构和新的数据平台发展。具体而言,在许多以前使用的 Postgres 的场景,现在更多的使用构建在 MySQL 之上的 schemaless 存储系统(小编:Uber的数据中间件)。在本文中,将探讨一些我们发现的 Postgres 的弊端,并解释我们切换 schemaless 和其他后端服务到 MySQL 数据库的原因。

Postgres 架构概述

我们遇到的大量 Postgres 限制如下:

  • 效率低下的写入架构
  • 低效的数据复制
  • 表损坏的问题
  • 糟糕的 MVCC 从库支持
  • 难以升级到新的版本

我们将在所有这些限制,首先通过分析 Postgres 如何组织在磁盘上的表和索引进行分析,特别是比较与 MySQL 使用 InnoDB 存储相同数据的实现方式。需要注意的是,我们在这里提出的分析主要是基于我们有些老的 Postgres 9.2 版本系列的经验。但据我们所知,本文中讨论的 PG 内部架构,并没有显著在新的 Postgres 版本中改变,就如在 9.2 版的磁盘数据设计,也没有比 Postgres 的 8.3 版(10 年前的版本)有什么显著变化。

磁盘数据格式

关系数据库必须执行一些关键任务:

  • 提供插入/更新/删除功能
  • 进行 schema 更改的能力
  • 实现多版本并发控制 (MVCC)机制,使不同的连接分别有他们各自的数据的事务视图

考虑如何将上述这些功能组合在一起工作,是数据库设计时的重要考虑部分。

Postgres 的核心设计之一是不变的(immutable)行数据。这些不变的行在 Postgres 中称为“tuple”。Tuple 在 Postgres 内部实现中由 CTID 来唯一标识 。一个 CTID 代表 tuple 在磁盘上的位置(即物理磁盘偏移)。多个 ctid 可以潜在描述一个单列(例如,当用于 MVCC 目的,或存在的行的多个版本时,行的旧版本尚未被 autovacuum 回收处理)。有组织的 tuple 的集合形成表。表本身具有的索引,通常被组织为 B 树数据结构,映射索引字段到 CTID 的负载。

通常,这些 ctids 对用户透明,但知道他们是如何工作,可以帮助您了解 Postgres 在磁盘上的数据结构。要查看某行当前 CTID,可以在查询的时候显式加上 “CTID”:

为了解释布局的细节,让我们考虑一个简单的用户表的例子。对于每个用户,我们有一个自动递增的用户 ID 的主键,还有用户的名字和姓氏,以及用户的出生年份。我们还定义了用户的全名复合二级索引(姓和名),并在用户的出生年份加上另一个二级索引。创建这样一个表 DDL 可能是这样的:

注意这个定义中的三个索引:主键索引加上两个二级索引。

对于本文中的例子,我们看下表的数据,它由一个选择有影响力的历史数学家开始:

如上所述,每行中隐含有独特的,不透明 CTID。因此,我们可以这样理解表的内部结构:

主键索引,它映射 ID 与 ctids,是这样定义的:

B 树被用在 id 字段上,B 树中的每个节点上保存了 CTID 值。注意,在这种情况下,在 B 树的字段的顺序,刚好与表中顺序相同,这是由于使用自动递增的 id 的缘故,但这并不一定需要是这种情况。

二级索引看起来相似;主要的区别是字段存储顺序不同,因为 B 树,必须按字典顺序组织。姓名索引(first,last)按字母表的顺序排列:


同样,birth_year 聚簇索引按升序排列,就像这样:

正如你所看到的,在这两种情况下,在各自的二级索引 CTID 字段本身并不是有序的,不象第一个自动递增的主键的情况。

假设我们需要更新此表中的记录。举例来说,假设要更新 al-Khwārizmī’ 的出生年份到 770 CE。正如前面提到的,行的 tuple 是不可变的。因此,要更新记录,需要添加一个新的  tuple。这种新的 tuple 有一个新的不透明 CTID,我们称之为 I。Postgres 需要能够从旧的 tuple D 处找到新的 I。在内部,Postgres 存储每个 tuple 中的版本字段,以及指向前一 tuple 的 ctid 指针(如果有)。因此,该表的新结构如下:

只要 al-Khwārizmī 的两个版本存在,索引则必须维护两行的记录。为简单起见,我们省略了主键索引并显示只有在这里的二级索引,它是这样的:


我们将旧版本标识成红色,将新版标识成绿色。在此之下,Postgres 使用另一个字段来保存该行版本,以确定哪一个 tuple 是最新的。这个新增的字段允许数据库确定事务看到的是那一个行的 tuple。


在 Postgres,主索引和二级索引都指向磁盘上的 tuple 偏移。当一个 tuple 的位置变化,各项索引都必须更新。

复制

当我们插入数据到表中,如果启用了流复制机制,Postgres 将会对数据进行复制,处于崩溃恢复的目的,数据库启用了预写日志 (WAL)并使用它来实现两阶段提交(2PC)。即使不启用复制的情况下,数据库也必须保留 WAL ,因为 WAL 提供了 ACID 的原子性(Atomicity)及持久性(Durability)能力。

我们可以通过如下场景来更好的理解 WAL,如果数据库遇到突然断电时意外崩溃,WAL 就提供了磁盘上表与索引更新变化的一个账本。当 Postgres 的守护程序再次启动后,就会对比账本上的记录与磁盘上的实际数据是否一致。如果帐本包含未在磁盘上的体现的数据,则可以利用 WAL 的记录来修正磁盘上的数据。

另外一方面,Postgres 也利用 WAL 将其在主从之间发送来实现流复制功能。每个从库复制数据与上述崩溃恢复的过程类似。流复制与实际崩溃恢复之间的唯一区别是,在恢复数据过程中是否能对外提供数据访问服务。

由于 WAL 实际上是为崩溃恢复目的而设计,它包含在物理磁盘的低级别更新的信息。WAL 记录的内容是在行 tuple 和它们的磁盘偏移量(即一行 ctids) 的实际磁盘上的代表级别。如果暂停一个 Postgres 主库,从库数据完全赶上后,在从库的实际磁盘上的内容完全匹配主库。因此,像工具 rsync 都可以恢复一个同步失败的从库。

Postgres 上述设计的大坑

Postgres 的上述设计给 Uber 在 PG 的使用上,导致了效率低下和其他很多问题。

1. 写放大(Write Amplification)

在 Postgres 设计的第一个问题是已知的写入放大 。

通常的写入放大是指一种问题数据写入,比如在 SSD 盘上,一个小逻辑更新(例如,写几个字节)转换到物理层后,成为一个更大的更昂贵的更新。

同样的问题也出现在 Postgres,在上面的例子,当我们做出的小逻辑更新,比如修改 al-Khwārizmī 的出生年份时,我们不得不执行至少四个物理的更新:

  1. 在表空间中写入新行的 tuple;
  2. 为新的 tuple 更新主键索引;
  3. 为新的 tuple 更新姓名索引 (first, last) ;
  4. 更新 birth_year 索引,为新的 tuple 添加一条记录;

事实上,这四步更新仅为了反映一个到主表的写操作;并且每个这些写入也同样需要在 WAL 得到体现,所以在磁盘上写入的总数目甚至比 4 步更大。

值得一提的是这里更新 2 和 3。当我们更新了 al-Khwārizmī 的出生年份,我们实际上并没有改变他的主键,我们也没有改变他的名字和姓氏。然而,这些索引仍必须与创建在数据库中的行记录了新的行的 tuple 的更新。对于具有大量二级索引的表,这些多余的步骤可能会导致巨大的低效。举例来说,如果我们有一个表上定义了十几个二级索引,更新一个字段,仅由一个单一的索引覆盖必须传播到所有的 12 项索引,以反映新行的 CTID。

2. 复制

因为复制发生在磁盘的变化上,因此写入放大问题自然会转化为复制层的放大。一个小的逻辑记录,如“更改出生年份为 CTID D 到 770”,WAL 会将上述描写的 4 步从网络上同步到从库,因此写入放大问题也等同一个复制放大问题,从而 Postgres 的复制数据流很快变得非常冗长,可能会占用大量的带宽。

在 Postgres 的复制发生一个数据中心内的情况下,复制带宽可能不是一个问题。现代网络设备和交换机可以处理大量的带宽,许多托管服务提供商提供免费或廉价的内部数据中心带宽。然而,当复制必须在不同数据中心之间发生的,问题都可以迅速升级

例如,Uber 原本使用的物理服务器在西海岸机房。为了灾难恢复的目的,我们在东海岸托管空间添加了一批服务器。在本设计中,我们西部数据中心作为主库,东海岸增加了一批服务器作为从库。

级联复制可以降低跨数据中心的带宽要求,只需要主库和一个从库之间同步一份数据所需的带宽和流量,即便在第二个数据中心配置了多个从库。然而,Postgres 的复制协议的详细程度,对于使用了大量二级索引的数据库,仍可能会导致数据的海量传输。采购跨国的带宽是昂贵的,即使有钱的土豪公司,也无法做到跨国的带宽和本地的带宽一样大。

这种带宽的问题也导致我们曾经在 WAL 归档方面出现过问题。除了发送所有从西海岸到东海岸的 WAL 更新,我们将所有的 WAL 记录归档到一个文件存储的 Web 云服务,这样当出现数据灾难情况时,可以从备份的 WAL 文件恢复。但是流量峰值时段,我们与存储网络服务的带宽根本无法跟上 WAL 写入的速度。

3. 数据损坏

在一次例行主数据库扩容的变更中,我们遇到了一个 Postgres 9.2 的 bug。从库的切换时间顺序执行不当,导致他们中的一些节点误传了一些 WAL 记录。因为这个 bug,应该被标记为无效的部分记录未标记成无效。

以下查询说明了这个 bug 如何影响我们的用户表:

SELECT * FROM users WHERE ID = 4;

此查询将返回两条记录:修改出生年份之前的老记录,再加上修改后的新记录。如果将 CTID 添加到 WHERE 列表中,我们将看到返回记录中存在不同的 CTID 记录,正如大家所预料的,返回了两个不同行的 tuple。

这个问题是有几个原因非常伤脑筋。首先,我们不能轻易找出这个问题影响的行数。从数据库返回的结果重复,导致应用程序逻辑在很多情况下会失败。我们最终使用防守编程语句来检测已知有这个问题表的情况。因为 bug 影响所有服务器,损坏的行在不同的服务器节点上可能是不同的,也就是说,在一个从库行 X 可能是坏的,Y 是好的,但对另一个从库,用行 X 可能是好的,Y 行可能是坏。事实上,我们并不确定数据损坏的从库节点数量,以及主库是否也存在数据损坏。

虽然我们知道,问题只是出现在每个数据库的少量几行,但我们还是非常担心,因为 Postgres 复制机制发生在物理层,任何小的错误格式有可能会导致彻底损坏我们的数据库索引。B 树的一个重要方面是,它们必须定期重新平衡 ,并且这些重新平衡操作可以完全改变树的结构作为子树被移到新的磁盘上的位置。如果错误数据被移动,这可能会导致树的大部分地区变得完全无效。

最后,我们追踪到了实际的 bug,并用它来确定新的 master 不存在任何损坏行。然后再把 master 的快照同步到所有从库上去,这是一个艰苦的体力活的过程(小编:看到美帝的 DBA 也这么苦逼心理终于平衡一点了),因为我们每次只能从在线的池子里面拿出有限几台来操作。

虽然我们遇到的这个 bug 仅影响 Postgres 9.2 的某些版本,而且目前已经修复了很久。但是,我们仍然发现这类令人担忧的 bug 可以再次发生。可能任意一个新的 Postgres 版本,它会带着这种致命类型的 bug,而且由于其复制的不合理的设计,这个问题一旦出现,就会立即蔓延到集群中所有复制链的数据库上。

4. 从库无 MVCC

Postgres 没有真正的从库 MVCC 支持。在从库任何时刻应用 WAL 更新,都会导致他们与主库物理结构完全一致。这样的设计也给 Uber 带来了一个问题。

为了支持 MVCC,Postgres 需要保留行的旧版本。如果流复制的从库正在执行一个事务,所有的更新操作将会在事务期间被阻塞。在这种情况下,Postgres 将会暂停 WAL 的线程,直到该事务结束。但如果该事务需要消耗相当长的时间,将会产生潜在的问题,Postgres 在这种情况下设定了超时:如果一个事务阻塞了 WAL 进程一段时间,Postgres 将会 kill 这个事务。

这样的设计意味着从库会定期的滞后于主库,而且也很容易写出代码,导致事务被 kill。这个问题可能不会很明显被发现。例如,假设一个开发人员有一个收据通过电子邮件发送给用户一些代码。这取决于它是如何写的,代码可能隐含有一个的保持打开,直到邮件发送完毕后,再关闭的一个数据库事务。虽然它总是不好的形式,让你的代码举行公开的数据库事务,同时执行无关的阻塞 I / O,但现实情况是,大多数工程师都不是数据库专家,可能并不总是理解这个问题,特别是使用掩盖了低级别的细节的 ORM 的事务。(小编:美帝程序员代码习惯跟咱们也很类似)

Postgres 的升级

因为复制记录在物理层面工作,这导致不能在不同的 Postgres GA 版本之间进行复制。运行的 Postgres 9.3 主数据库无法复制数据到 Postgres 9.2 的从库上,也无法在运行 9.2 的主数据库复制数据到 Postgres 9.3 的从库上。

我们按照以下这些步骤,从一个 Postgres 的 GA 版本升级到另一个:

  • 关闭主数据库。
  • 在主库上运行 pg_upgrade 命令,这是更新主库数据的命令 。在一个大的数据库上,这很容易需要几个小时的时间,执行期间不能够提供任何访问服务。
  • 再次启动主库。
  • 创建主库的新快照,这一步完全复制一份主库的所有数据,因此对于大型数据库,它也需要几个小时的时间。
  • 清除所有从库上的数据,将从主库导出的快照恢复到所有从库。
  • 把每个从库恢复到原先的复制层次结构。等待从库追上主库的最新的更新数据。

我们使用上述方法将 Postgres 9.1 成功升级到 Postgres 9.2。然而,这个过程花了太多时间,我们不能接受这个过程再来一次。到 Postgres 9.3 出来时,Uber 的增长导致我们的数据大幅增长,所以升级时间将会更加漫长。出于这个原因,我们的 Postgres 的实例一直运行 Postgres 9.2 到今天,尽管当前的 Postgres GA 版本是 9.5

如果你正在运行 Postgres 9.4 或更高版本,你可以使用类似 pglogical,它实现了 Postgres 的一个逻辑复制层。使用 pglogical,可以在不同的 Postgres 版本之间复制数据,这意味着升级比如从 9.4 到 9.5,不会产生显著的停机时间。但这个工具的能力依然存疑,因为它没有集成到 Postgres 主干,另外对于老版本的用户,pglogical 仍然不能支持。

MySQL 架构概述

为了更进一步解释的 Postgres 的局限性,我们了解为什么 MySQL 是 Uber 新存储工程 Schemaless 的底层存储 。在许多情况下,我们发现 MySQL 更有利于我们的使用场景。为了了解这些差异,我们考察了 MySQL 的架构,并与 Postgres 进行对比。我们特别分析 MySQL 和 InnoDB 存储引擎如何一同工作。Innodb 不仅在 Uber 大量使用,它也是世界上使用最广泛的 MySQL 存储引擎。

 

InnoDB 的磁盘数据结构

与 Postgres 一样,InnoDB 支持如 MVCC 和可变数据这样的高级特性。详细讨论 InnoDB 的磁盘数据格式超出了本文的范围;在这里,我们将重点放在从 Postgres 的主要区别上。

 

最重要的架构区别在于 Postgres 的索引记录直接映射到磁盘上的位置时,InnoDB 保持二级结构。而不是拿着一个指向磁盘上的行位置(如 CTID 在 Postgres),InnoDB 的第二个索引记录持有一个指向主键值。因此,在 MySQL 中的二级索引与相关联的主键索引键,是如下所示:

为了执行上的(first, last)索引查找,我们实际上需要做两查找。第一次查找表,找到记录的主键。一旦找到主键,则根据主键找到记录在磁盘上的位置。

这种设计意味着 InnoDB 对 Postgres 在做非主键查找时有小小的劣势,因为 MySQL 要做两次索引查找,但是 Postgres 只用做一次。然后因为数据是标准化的,行更新的时候只需要更新相应的索引记录。

而且 InnoDB 通常在相同的行更新数据,如果旧事务因为 MVCC 的 MySQL 从库而需要引用一行,老数据将进入一个特殊的区域,称为回滚段。

如果我们更新 al-Khwārizmī 的出生年份,我们看会发生什么。如果有足够的空间,数据库会直接更新 ID 为 4 的行(更新出生年份不需要额外的空间,因为年份是定长的 int)。出生年份这一列上的索引同时也会被更新。这一行的老版本被复制到回滚段。主键索引不需要更新,同样姓名索引也不需要更新。如果在这个表上有大量索引,数据库需要更新包含了 birth_year 的索引。因此,我们并不需要更新 signup_date,last_login_time 这些索引,而 Postgres 则必须全更新一遍。

这样的设计也使得 vocuum 和压缩效率更高。所有需要 vocuum 的数据都在回滚段内。相比之下,Postgres 的自动清理过程中必须做全表扫描,以确定删除的行。

 

MySQL 使用额外的间接层:二级索引记录指向主索引记录,而主索引本身包含在磁盘上的排的位置。如果一个行偏移的变化,只有主索引需要更新。

复制

MySQL 支持多个不同的复制模式:

  • 语句级别的复制:复制 SQL语句(例如,它会从字面上直译复制的语句,如:更新用户 SET birth_year = 770 WHERE ID = 4 )
  • 行级别的复制:复制所有变化的行记录
  • 混合复制:混合这两种模式

这些模式都各有利弊。基于语句的复制通常最为紧凑,但可能需要从库来支持昂贵的语句来更新少量数据。在另一方面,基于行的复制,如同 Postgres 的 WAL 复制,是更详细,但会导致对从库数据更可控,并且更新从库数据更高效。

在 MySQL 中,只有主索引有一个指向行的磁盘上的指针。这个对于复制来说很重要。MySQL 的复制流只需要包含有关逻辑更新行的信息。复制更新如“更改行的时间戳 x 从 T_ 1 至 T_ 2 ”,从库自动根据需要更新相关的索引。

相比之下,Postgres 的复制流包含物理变化,如“在磁盘偏移8382491,写字节XYZ。” 在 Postgres 里,每一次磁盘物理改变都需要被记录到 WAL 里。很小的逻辑变化(如更新时间戳)会引起许多磁盘上的改变:Postgres 必须插入新的 tuple,并更新所有索引指向新的 tuple。因此许多变化将被写入 WAL。这种设计的差异意味着 MySQL 复制二进制日志是显著比 PostgreSQL 的 WAL 流更紧凑。

复制如何工作也会影响从库的 MVCC。由于 MySQL 的复制流使用逻辑的更新,从库可以有真正的 MVCC 语义; 因此,读库查询不会阻塞复制流。相比之下,Postgres 的 WAL 流包含物理磁盘上的变化,使得 Postgres 的从库无法应用复制更新从而与查询相冲突,所以 PG 复制不能实现 MVCC。

MySQL 的复制架构意味着,bug 也许会导致表损坏,但不太可能导致灾难性的失败。复制发生在逻辑层,所以像一个重新平衡 B tree 这样的操作不会导致索引损坏。一个典型的 MySQL 复制问题是一个语句被跳过(或较少一点的情况,重复执行)的情况下。这可能会导致数据丢失或无效,但不会导致数据库出现灾难问题。

最后,MySQL 的复制架构使得它可以在 MySQL 不同版本之间进行复制。MySQL 只在复制格式改变的时候才增加版本号,这对 MySQL 来说很不常见。MySQL 的逻辑复制格式也意味着,在磁盘上的变化在存储引擎层不影响复制格式。做一个 MySQL 升级的典型方法是在一个时间来更新应用到一个从库,一旦你更新所有从库,你可以把它提为新的 master。这个操作几乎是 0 宕机的,这样也能保证 MySQL 能及时得到更新。

其他 MySQL 设计优势

到目前为止,我们集中于 Postgres 和 MySQL 在磁盘上的架构。MySQL 的架构导致性能比 Postgres 有显著优势。

缓冲池设计

首先,两个数据库缓冲池的工作方式不同。Postgres 用作缓存的内存比起内存的机器上的内存总数小很多。为了提高性能,Postgres 允许内核通过自动缓存最近访问的磁盘数据的页面缓存。举例来说,我们最大的 Postgres 的节点有 768G 可用内存,但只有大约 25G 的内存实际上是被 Postgres 的 RSS 内存使用,这让 700 多 GB 的可用内存留给 Linux 的页面缓存。

这种设计的问题是,相比访问 RSS 内存,操作系统的页面缓存访问数据实际上开销更大。从磁盘查找数据,Postgres 执行 lseek 和 read 系统调用来定位数据。这些系统调用的招致上下文切换,这比从主存储器访问数据更昂贵。事实上,Postgres 在这方面完全没有优化:Postgres 没有利用的 pread(2)系统调用,pread 会合并 seed + read 操作成一个单一的系统调用。

相比之下,InnoDB 存储引擎实现了自己的 LRUs 算法,它叫做 InnoDB 的缓冲池。这在逻辑上类似于 Linux 的页面缓存,但在用户空间实现的,因此也显著比 Postgres 设计复杂,InnoDB 缓冲池的设计有一些巨大的优势:

  • 使得它可以实现一个自定义的 LRU 设计。例如,它可以检测到病态的访问模式,并且阻止这种模式给缓冲池带来太大的破坏。
  • 它导致更少的上下文切换。通过 InnoDB 缓冲池访问的数据不需要任何用户/内核上下文切换。最坏的情况下的行为是一个的出现 TLB miss,但是可以通过使用 huag page 来搞定。

 

连接处理

MySQL 的实现是对每个连接生成一个线程,相对来说开销较低;每个线程拥有堆栈空间的一些内存开销,再加上堆上分配用于连接特定的缓冲区一些内存。对 MySQL 来说扩展到 10,000 左右的并发连接不是罕见的事情,实事上我们现在的 MySQL 接近这个连接数。

Postgres 使用的是每连接一个进程的设计。这很明显会比每连接每线程的设计开销更大。启动一个新的进程比一个新的线程会占用更多的内存。此外,线程之间进行通讯比进程之间 IPC 开销低很多。Postgres 9.2 使用系统V IPC为IPC原语,而不是使用线程模型中轻量级的 futexes,futex 的非竞争是常见的情况,比 System V IPC 速度更快,不需要进行上下文切换。

除了与 Postgres 的设计相关联的内存和 IPC 开销,即使有足够的可用内存可用,Postgres 对处理大连接数的支持依然非常差。我们已经碰到扩展 Postgres 几百个活动连接就碰到显著的问题的情况,在官方文档中也没有确切的说明原因,它强烈建议使用独立的连接池来保证大连接数。因此,使用 pgbouncer 做连接池基本可行。但是,在我们后端系统使用过程中发现有些 BUG,这会导致开启大量的原本不需要的活跃连接,这些 BUG 也已经造成好几次宕机。

结论

Postgres 在 Uber 初期运行的很好,但是 PG 很遗憾没能很好适应我们的数据增长。今天,我们有一些遗留的 Postgres 实例,但我们的数据库大部分已经迁移到 MySQL(通常使用我们的 Schemaless 中间层),在一些特殊的情况下,也使用 NoSQL 数据库如 Cassandra。我们对 MySQL 的使用非常满意,后续可能会在更多的博客文章中介绍其在 Uber 一些更先进的用途。

作者 Evan Klitzke 是 Uber 核心基础架构组资深软件工程师。他也是一个数据库爱好者,是 2012 年 9 月加入 Uber 的一名早鸟。

英文原文:

Why Uber Engineering Switched from Postgres to MySQL

Categories: life is fun

Flask + Gunicorn + Nginx 部署


最近很多朋友都在问我关于 Flask 部署的问题,说实在的我很乐意看到和回答这样的问题,至少证明了越来越多人开始用 Flask 了。

之前我曾发表过一篇在 Ubuntu 上用 uwsgi + nginx 的 Flask 部署方法,说实在的 uwsgi 是个大坑可能用在 Django 上还好吧,不过用在 Flask 上未必就如此。至少 , uwsgi 是个极为折腾人的东西。总之,我是一直认为复杂的东西未必不好,但一定是不好用的。

我自己也经过多番的纠结与尝试,也终于找到了一个 Flask 上靠谱的部署方案。我现在公司的微信后端平台也采用这种部署方案。如果有兴趣的朋友也不妨一看,或者给我提些更好的方案,毕竟知识只有共享了才知道是否有价值。

我在 Flask 官方文档中找到其中一个有意思的内容,这里是原谅链接:Standalone WSGI Containers ,其中并没有 uwsgi 的身影。悻然,但是找到了一个不用折腾的 Flask 部署方案了 —— Gunicorn。

Ubuntu 上的准备

假定你是在腾迅云或者阿里云购买了VPS,那么直接执行以下指令吧,其它的不多解释了,无非就是准备一下 python 环境。

$ sudo apt-get update
$ sudo apt-get install python-dev python-pip python-virtualenv

然后安装 nginx

$ sudo apt-get install nginx

/var/www 目录下建立一个 myflask 的文件夹(你的项目目录),然后用 chmod 改一下权限

$ sudo mkdir /var/www/myflask
$ sudo chmod 777 /var/www/myflask

注:当然你可以使用 nginx 的默认网站目录 /usr/share/nginx/html

然后用 scp 指令直接将本机上的 flask 项目传到服务器:

$ scp -r myflask root@www.mydomain.com:/var/www/myflask

域名就改成地址或者你的服务器正在使用的域名,我这里是用 root 用户进入的,你得按你的服务器的用户来修改。两大云的默认根用户是:

  • 腾迅 :ubuntu
  • 阿里 :root

Gunicorn

Gunicorn 绿色独角兽 是一个Python WSGI UNIX的HTTP服务器。这是一个pre-fork worker的模型,从Ruby的独角兽(Unicorn )项目移植。该Gunicorn服务器大致与各种Web框架兼容,只需非常简单的执行,轻量级的资源消耗,以及相当迅速。

我曾经Google 过 Gunicorn 与 uwsgi ,都说uwsgi 的性能要比 gunicorn 高,所以最终结果就杯具了。不过,现在回过头来看这只 “独角兽”还为时不晚吧。

安装 Gunicorn

Gunicorn 应该装在你的 virtualenv 环境下,关于 virtualenv 就不多说了,如果没用过那就赶快脑补吧。安装前记得激活 venv

(venv) $ pip install gunicorn

运行 Gunicorn

(venv) $ gunicorn -w 4 -b 127.0.0.1:8080 wsgi:application

That’s all! 它的安装就这么简单。不过这里得作一个解释。就是最后的那个参数 wsgi:application 这个是程序入口,我得写个小小的范例来说明一下:

新建一个 wsgi.py 的文件, 注意,这里和 Flask 项目中常用的 manage.py 引导脚本是没有半点毛关系的。(这是我笨,以前一直没分清被uwsgi搞糊涂了)

# wsgi.py

from flask import Flask

def create_app():
  # 这个工厂方法可以从你的原有的 `__init__.py` 或者其它地方引入。
  app = Flask(__name__)
  return app

application = create_app()

if __name__ == '__main__':
    application.run()

好了,这个 wsgi:application 参数就很好理解了, 分两部:wsgi 就是引导用的 python 文件名称(不包括后缀/模块名)application就是 Flask 实例的名称。这样 gunicorn 就会找到具体要 host 哪一个 flask 实例了。

从这里开始就可以体现 gunicorn 的好了,我们根本不用配什么配置文件的,一个指令就可以将它起动。

Nginx 的配置

关于 Nginx 我也就不详细讲了,我们就直奔主题,杀入 Nginx 的默认配置文件

sudo nano /etc/nginx/site-avalidable/default

暴力修改成为以下的内容

建议先备份一下 default 文件
sudo cp /etc/nginx/site-avalidable/default /etc/nginx/site-avalidable/default.bak

server {
    listen 80;
    server_name example.org; # 这是HOST机器的外部域名,用地址也行

    location / {
        proxy_pass http://127.0.0.1:8080; # 这里是指向 gunicorn host 的服务地址
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }

  }

记得完成 nginx 需要重新起动 nginx 服务喔!

sudo service nginx restart

将 Gunicorn 作为服务运行

这就是最后一步了,我们在此将采用 UpStart 配置Flask程序作为服务程序在Linux起动时运行。首先建立起动配置文件:

sudo nano /etc/init/myflask.conf

然后加入如下配置

description "The myflask service"

start on runlevel [2345]
stop on runlevel [!2345]


respawn
setuid root
setgid www-data

env PATH= /var/www/myflask/venv/bin
chdir /var/www/myflask/

exec gunicorn -w 4 -b 127.0.0.1:8080 wsgi:application

OK 大功告成,启动 myflask 服务

sudo service myflask start

这里有一点必须补充的,请留意在 myflask.conf 中的

env PATH= /var/www/myflask/venv/bin
chdir /var/www/myflask/

这里所指向的地址就是你的项目路径和 virtualenv 的路径

小结

这个部署过程感觉比我之前所介绍的 uwsgi 那种简单很多吧。这里给一点小 Tips 如果你用 Fabric 来完成这个部署过程的话那么就是纯自动化部署了喔,值得尝试的。

Categories: life is fun

Let’s Build A Web Server. Part 3.


Date Wed, May 20, 2015

We learn most when we have to invent” —Piaget

In Part 2 you created a minimalistic WSGI server that could handle basic HTTP GET requests. And I asked you a question, “How can you make your server handle more than one request at a time?” In this article you will find the answer. So, buckle up and shift into high gear. You’re about to have a really fast ride. Have your Linux, Mac OS X (or any *nix system) and Python ready. All source code from the article is available on GitHub.

First let’s remember what a very basic Web server looks like and what the server needs to do to service client requests. The server you created in Part 1 and Part 2 is an iterative server that handles one client request at a time. It cannot accept a new connection until after it has finished processing a current client request. Some clients might be unhappy with it because they will have to wait in line, and for busy servers the line might be too long.

Here is the code of the iterative server webserver3a.py:

#####################################################################
# Iterative server - webserver3a.py                                 #
#                                                                   #
# Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X  #
#####################################################################
import socket

SERVER_ADDRESS = (HOST, PORT) = '', 8888
REQUEST_QUEUE_SIZE = 5


def handle_request(client_connection):
    request = client_connection.recv(1024)
    print(request.decode())
    http_response = b"""\
HTTP/1.1 200 OK

Hello, World!
"""
    client_connection.sendall(http_response)


def serve_forever():
    listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    listen_socket.bind(SERVER_ADDRESS)
    listen_socket.listen(REQUEST_QUEUE_SIZE)
    print('Serving HTTP on port {port} ...'.format(port=PORT))

    while True:
        client_connection, client_address = listen_socket.accept()
        handle_request(client_connection)
        client_connection.close()

if __name__ == '__main__':
    serve_forever()

To observe your server handling only one client request at a time, modify the server a little bit and add a 60 second delay after sending a response to a client. The change is only one line to tell the server process to sleep for 60 seconds.

And here is the code of the sleeping server webserver3b.py:

#########################################################################
# Iterative server - webserver3b.py                                     #
#                                                                       #
# Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X      #
#                                                                       #
# - Server sleeps for 60 seconds after sending a response to a client   #
#########################################################################
import socket
import time

SERVER_ADDRESS = (HOST, PORT) = '', 8888
REQUEST_QUEUE_SIZE = 5


def handle_request(client_connection):
    request = client_connection.recv(1024)
    print(request.decode())
    http_response = b"""\
HTTP/1.1 200 OK

Hello, World!
"""
    client_connection.sendall(http_response)
    time.sleep(60)  # sleep and block the process for 60 seconds


def serve_forever():
    listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    listen_socket.bind(SERVER_ADDRESS)
    listen_socket.listen(REQUEST_QUEUE_SIZE)
    print('Serving HTTP on port {port} ...'.format(port=PORT))

    while True:
        client_connection, client_address = listen_socket.accept()
        handle_request(client_connection)
        client_connection.close()

if __name__ == '__main__':
    serve_forever()

Start the server with:

$ python webserver3b.py

Now open up a new terminal window and run the curl command. You should instantly see the“Hello, World!” string printed on the screen:

$ curl http://localhost:8888/hello
Hello, World!

And without delay open up a second terminal window and run the same curl command:

$ curl http://localhost:8888/hello

If you’ve done that within 60 seconds then the second curl should not produce any output right away and should just hang there. The server shouldn’t print a new request body on its standard output either. Here is how it looks like on my Mac (the window at the bottom right corner highlighted in yellow shows the second curl command hanging, waiting for the connection to be accepted by the server):

After you’ve waited long enough (more than 60 seconds) you should see the first curl terminate and the second curl print “Hello, World!” on the screen, then hang for 60 seconds, and then terminate:

The way it works is that the server finishes servicing the first curl client request and then it starts handling the second request only after it sleeps for 60 seconds. It all happens sequentially, or iteratively, one step, or in our case one client request, at a time.

Let’s talk about the communication between clients and servers for a bit. In order for two programs to communicate with each other over a network, they have to use sockets. And you saw sockets both in Part 1 and Part 2. But what is a socket?

A socket is an abstraction of a communication endpoint and it allows your program to communicate with another program using file descriptors. In this article I’ll be talking specifically about TCP/IP sockets on Linux/Mac OS X. An important notion to understand is the TCPsocket pair.

The socket pair for a TCP connection is a 4-tuple that identifies two endpoints of the TCPconnection: the local IP address, local port, foreign IP address, and foreign port. A socket pair uniquely identifies every TCP connection on a network. The two values that identify each endpoint, an IP address and a port number, are often called a socket.1

So, the tuple {10.10.10.2:49152, 12.12.12.3:8888} is a socket pair that uniquely identifies two endpoints of the TCP connection on the client and the tuple {12.12.12.3:8888, 10.10.10.2:49152} is a socket pair that uniquely identifies the same two endpoints of the TCPconnection on the server. The two values that identify the server endpoint of the TCPconnection, the IP address 12.12.12.3 and the port 8888, are referred to as a socket in this case (the same applies to the client endpoint).

The standard sequence a server usually goes through to create a socket and start accepting client connections is the following:

  1. The server creates a TCP/IP socket. This is done with the following statement in Python:
    listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    
  2. The server might set some socket options (this is optional, but you can see that the server code above does just that to be able to re-use the same address over and over again if you decide to kill and re-start the server right away).
    listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    
  3. Then, the server binds the address. The bind function assigns a local protocol address to the socket. With TCP, calling bind lets you specify a port number, an IP address, both, or neither.1
    listen_socket.bind(SERVER_ADDRESS)
    
  4. Then, the server makes the socket a listening socket
    listen_socket.listen(REQUEST_QUEUE_SIZE)
    

The listen method is only called by servers. It tells the kernel that it should accept incoming connection requests for this socket.

After that’s done, the server starts accepting client connections one connection at a time in a loop. When there is a connection available the accept call returns the connected client socket. Then, the server reads the request data from the connected client socket, prints the data on its standard output and sends a message back to the client. Then, the server closes the client connection and it is ready again to accept a new client connection.

Here is what a client needs to do to communicate with the server over TCP/IP:

Here is the sample code for a client to connect to your server, send a request and print the response:

 import socket

 # create a socket and connect to a server
 sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
 sock.connect(('localhost', 8888))

 # send and receive some data
 sock.sendall(b'test')
 data = sock.recv(1024)
 print(data.decode())

After creating the socket, the client needs to connect to the server. This is done with theconnect call:

sock.connect(('localhost', 8888))

The client only needs to provide the remote IP address or host name and the remote port number of a server to connect to.

You’ve probably noticed that the client doesn’t call bind and accept. The client doesn’t need to call bind because the client doesn’t care about the local IP address and the local port number. The TCP/IP stack within the kernel automatically assigns the local IP address and the local port when the client calls connect. The local port is called an ephemeral port, i.e. a short-lived port.

A port on a server that identifies a well-known service that a client connects to is called a well-known port (for example, 80 for HTTP and 22 for SSH). Fire up your Python shell and make a client connection to the server you run on localhost and see what ephemeral port the kernel assigns to the socket you’ve created (start the server webserver3a.py or webserver3b.py before trying the following example):

>>> import socket
>>> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> sock.connect(('localhost', 8888))
>>> host, port = sock.getsockname()[:2]
>>> host, port
('127.0.0.1', 60589)

In the case above the kernel assigned the ephemeral port 60589 to the socket.

There are some other important concepts that I need to cover quickly before I get to answer the question from Part 2. You will see shortly why this is important. The two concepts are that of aprocess and a file descriptor.

What is a process? A process is just an instance of an executing program. When the server code is executed, for example, it’s loaded into memory and an instance of that executing program is called a process. The kernel records a bunch of information about the process – its process ID would be one example – to keep track of it. When you run your iterative serverwebserver3a.py or webserver3b.py you run just one process.

Start the server webserver3b.py in a terminal window:

$ python webserver3b.py

And in a different terminal window use the ps command to get the information about that process:

$ ps | grep webserver3b | grep -v grep
7182 ttys003    0:00.04 python webserver3b.py

The ps command shows you that you have indeed run just one Python process webserver3b. When a process gets created the kernel assigns a process ID to it, PID. In UNIX, every user process also has a parent that, in turn, has its own process ID called parent process ID, or PPIDfor short. I assume that you run a BASH shell by default and when you start the server, a new process gets created with a PID and its parent PID is set to the PID of the BASH shell.

Try it out and see for yourself how it all works. Fire up your Python shell again, which will create a new process, and then get the PID of the Python shell process and the parent PID (the PID of your BASH shell) using os.getpid() and os.getppid() system calls. Then, in another terminal window run ps command and grep for the PPID (parent process ID, which in my case is 3148). In the screenshot below you can see an example of a parent-child relationship between my child Python shell process and the parent BASH shell process on my Mac OS X:

Another important concept to know is that of a file descriptor. So what is a file descriptor? A file descriptor is a non-negative integer that the kernel returns to a process when it opens an existing file, creates a new file or when it creates a new socket. You’ve probably heard that inUNIX everything is a file. The kernel refers to the open files of a process by a file descriptor. When you need to read or write a file you identify it with the file descriptor. Python gives you high-level objects to deal with files (and sockets) and you don’t have to use file descriptors directly to identify a file but, under the hood, that’s how files and sockets are identified in UNIX: by their integer file descriptors.

By default, UNIX shells assign file descriptor 0 to the standard input of a process, file descriptor 1 to the standard output of the process and file descriptor 2 to the standard error.

As I mentioned before, even though Python gives you a high-level file or file-like object to work with, you can always use the fileno() method on the object to get the file descriptor associated with the file. Back to your Python shell to see how you can do that:

>>> import sys
>>> sys.stdin
<open file '<stdin>', mode 'r' at 0x102beb0c0>
>>> sys.stdin.fileno()
0
>>> sys.stdout.fileno()
1
>>> sys.stderr.fileno()
2

And while working with files and sockets in Python, you’ll usually be using a high-level file/socket object, but there may be times where you need to use a file descriptor directly. Here is an example of how you can write a string to the standard output using a write system call that takes a file descriptor integer as a parameter:

>>> import sys
>>> import os
>>> res = os.write(sys.stdout.fileno(), 'hello\n')
hello

And here is an interesting part – which should not be surprising to you anymore because you already know that everything is a file in Unix – your socket also has a file descriptor associated with it. Again, when you create a socket in Python you get back an object and not a non-negative integer, but you can always get direct access to the integer file descriptor of the socket with the fileno() method that I mentioned earlier.

>>> import socket
>>> sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
>>> sock.fileno()
3

One more thing I wanted to mention: have you noticed that in the second example of the iterative server webserver3b.py, when the server process was sleeping for 60 seconds you could still connect to the server with the second curl command? Sure, the curl didn’t output anything right away and it was just hanging out there but how come the server was not accepting a connection at the time and the client was not rejected right away, but instead was able to connect to the server? The answer to that is the listen method of a socket object and itsBACKLOG argument, which I called REQUEST_QUEUE_SIZE in the code. The BACKLOGargument determines the size of a queue within the kernel for incoming connection requests. When the server webserver3b.py was sleeping, the second curl command that you ran was able to connect to the server because the kernel had enough space available in the incoming connection request queue for the server socket.

While increasing the BACKLOG argument does not magically turn your server into a server that can handle multiple client requests at a time, it is important to have a fairly large backlog parameter for busy servers so that the accept call would not have to wait for a new connection to be established but could grab the new connection off the queue right away and start processing a client request without delay.

Whoo-hoo! You’ve covered a lot of ground. Let’s quickly recap what you’ve learned (or refreshed if it’s all basics to you) so far.

  • Iterative server
  • Server socket creation sequence (socket, bind, listen, accept)
  • Client connection creation sequence (socket, connect)
  • Socket pair
  • Socket
  • Ephemeral port and well-known port
  • Process
  • Process ID (PID), parent process ID (PPID), and the parent-child relationship.
  • File descriptors
  • The meaning of the BACKLOG argument of the listen socket method

Now I am ready to answer the question from Part 2: “How can you make your server handle more than one request at a time?” Or put another way, “How do you write a concurrent server?”

The simplest way to write a concurrent server under Unix is to use a fork() system call.

Here is the code of your new shiny concurrent server webserver3c.py that can handle multiple client requests at the same time (as in our iterative server example webserver3b.py, every child process sleeps for 60 secs):

###########################################################################
# Concurrent server - webserver3c.py                                      #
#                                                                         #
# Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X        #
#                                                                         #
# - Child process sleeps for 60 seconds after handling a client's request #
# - Parent and child processes close duplicate descriptors                #
#                                                                         #
###########################################################################
import os
import socket
import time

SERVER_ADDRESS = (HOST, PORT) = '', 8888
REQUEST_QUEUE_SIZE = 5


def handle_request(client_connection):
    request = client_connection.recv(1024)
    print(
        'Child PID: {pid}. Parent PID {ppid}'.format(
            pid=os.getpid(),
            ppid=os.getppid(),
        )
    )
    print(request.decode())
    http_response = b"""\
HTTP/1.1 200 OK

Hello, World!
"""
    client_connection.sendall(http_response)
    time.sleep(60)


def serve_forever():
    listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    listen_socket.bind(SERVER_ADDRESS)
    listen_socket.listen(REQUEST_QUEUE_SIZE)
    print('Serving HTTP on port {port} ...'.format(port=PORT))
    print('Parent PID (PPID): {pid}\n'.format(pid=os.getpid()))

    while True:
        client_connection, client_address = listen_socket.accept()
        pid = os.fork()
        if pid == 0:  # child
            listen_socket.close()  # close child copy
            handle_request(client_connection)
            client_connection.close()
            os._exit(0)  # child exits here
        else:  # parent
            client_connection.close()  # close parent copy and loop over

if __name__ == '__main__':
    serve_forever()

Before diving in and discussing how fork works, try it, and see for yourself that the server can indeed handle multiple client requests at the same time, unlike its iterative counterpartswebserver3a.py and webserver3b.py. Start the server on the command line with:

$ python webserver3c.py

And try the same two curl commands you’ve tried before with the iterative server and see for yourself that, now, even though the server child process sleeps for 60 seconds after serving a client request, it doesn’t affect other clients because they are served by different and completely independent processes. You should see your curl commands output “Hello, World!” instantly and then hang for 60 secs. You can keep on running as many curl commands as you want (well, almost as many as you want🙂 and all of them will output the server’s response “Hello, World” immediately and without any noticeable delay. Try it.

The most important point to understand about fork() is that you call fork once but it returns twice: once in the parent process and once in the child process. When you fork a new process the process ID returned to the child process is 0. When the fork returns in the parent process it returns the child’s PID.

I still remember how fascinated I was by fork when I first read about it and tried it. It looked like magic to me. Here I was reading a sequential code and then “boom!”: the code cloned itself and now there were two instances of the same code running concurrently. I thought it was nothing short of magic, seriously.

When a parent forks a new child, the child process gets a copy of the parent’s file descriptors:

You’ve probably noticed that the parent process in the code above closed the client connection:

else:  # parent
    client_connection.close()  # close parent copy and loop over

So how come a child process is still able to read the data from a client socket if its parent closed the very same socket? The answer is in the picture above. The kernel uses descriptor reference counts to decide whether to close a socket or not. It closes the socket only when its descriptor reference count becomes 0. When your server creates a child process, the child gets the copy of the parent’s file descriptors and the kernel increments the reference counts for those descriptors. In the case of one parent and one child, the descriptor reference count would be 2 for the client socket and when the parent process in the code above closes the client connection socket, it merely decrements its reference count which becomes 1, not small enough to cause the kernel to close the socket. The child process also closes the duplicate copy of the parent’s listen_socket because the child doesn’t care about accepting new client connections, it cares only about processing requests from the established client connection:

listen_socket.close()  # close child copy

I’ll talk about what happens if you do not close duplicate descriptors later in the article.

As you can see from the source code of your concurrent server, the sole role of the server parent process now is to accept a new client connection, fork a new child process to handle that client request, and loop over to accept another client connection, and nothing more. The server parent process does not process client requests – its children do.

A little aside. What does it mean when we say that two events are concurrent?

When we say that two events are concurrent we usually mean that they happen at the same time. As a shorthand that definition is fine, but you should remember the strict definition:

Two events are concurrent if you cannot tell by looking at the program which will happen first.2

Again, it’s time to recap the main ideas and concepts you’ve covered so far.

  • The simplest way to write a concurrent server in Unix is to use the fork() system call
  • When a process forks a new process it becomes a parent process to that newly forked child process.
  • Parent and child share the same file descriptors after the call to fork.
  • The kernel uses descriptor reference counts to decide whether to close the file/socket or not
  • The role of a server parent process: all it does now is accept a new connection from a client, fork a child to handle the client request, and loop over to accept a new client connection.

Let’s see what is going to happen if you don’t close duplicate socket descriptors in the parent and child processes. Here is a modified version of the concurrent server where the server does not close duplicate descriptors, webserver3d.py:

###########################################################################
# Concurrent server - webserver3d.py                                      #
#                                                                         #
# Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X        #
###########################################################################
import os
import socket

SERVER_ADDRESS = (HOST, PORT) = '', 8888
REQUEST_QUEUE_SIZE = 5


def handle_request(client_connection):
    request = client_connection.recv(1024)
    http_response = b"""\
HTTP/1.1 200 OK

Hello, World!
"""
    client_connection.sendall(http_response)


def serve_forever():
    listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    listen_socket.bind(SERVER_ADDRESS)
    listen_socket.listen(REQUEST_QUEUE_SIZE)
    print('Serving HTTP on port {port} ...'.format(port=PORT))

    clients = []
    while True:
        client_connection, client_address = listen_socket.accept()
        # store the reference otherwise it's garbage collected
        # on the next loop run
        clients.append(client_connection)
        pid = os.fork()
        if pid == 0:  # child
            listen_socket.close()  # close child copy
            handle_request(client_connection)
            client_connection.close()
            os._exit(0)  # child exits here
        else:  # parent
            # client_connection.close()
            print(len(clients))

if __name__ == '__main__':
    serve_forever()

Start the server with:

$ python webserver3d.py

Use curl to connect to the server:

$ curl http://localhost:8888/hello
Hello, World!

Okay, the curl printed the response from the concurrent server but it did not terminate and kept hanging. What is happening here? The server no longer sleeps for 60 seconds: its child process actively handles a client request, closes the client connection and exits, but the clientcurl still does not terminate.

So why does the curl not terminate? The reason is the duplicate file descriptors. When the child process closed the client connection, the kernel decremented the reference count of that client socket and the count became 1. The server child process exited, but the client socket was not closed by the kernel because the reference count for that socket descriptor was not 0, and, as a result, the termination packet (called FIN in TCP/IP parlance) was not sent to the client and the client stayed on the line, so to speak. There is also another problem. If your long-running server doesn’t close duplicate file descriptors, it will eventually run out of available file descriptors:

Stop your server webserver3d.py with Control-C and check out the default resources available to your server process set up by your shell with the shell built-in command ulimit:

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 3842
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 3842
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

As you can see above, the maximum number of open file descriptors (open files) available to the server process on my Ubuntu box is 1024.

Now let’s see how your server can run out of available file descriptors if it doesn’t close duplicate descriptors. In an existing or new terminal window, set the maximum number of open file descriptors for your server to be 256:

$ ulimit -n 256

Start the server webserver3d.py in the same terminal where you’ve just run the $ ulimit -n 256 command:

$ python webserver3d.py

and use the following client client3.py to test the server.

#####################################################################
# Test client - client3.py                                          #
#                                                                   #
# Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X  #
#####################################################################
import argparse
import errno
import os
import socket


SERVER_ADDRESS = 'localhost', 8888
REQUEST = b"""\
GET /hello HTTP/1.1
Host: localhost:8888

"""


def main(max_clients, max_conns):
    socks = []
    for client_num in range(max_clients):
        pid = os.fork()
        if pid == 0:
            for connection_num in range(max_conns):
                sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                sock.connect(SERVER_ADDRESS)
                sock.sendall(REQUEST)
                socks.append(sock)
                print(connection_num)
                os._exit(0)


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description='Test client for LSBAWS.',
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )
    parser.add_argument(
        '--max-conns',
        type=int,
        default=1024,
        help='Maximum number of connections per client.'
    )
    parser.add_argument(
        '--max-clients',
        type=int,
        default=1,
        help='Maximum number of clients.'
    )
    args = parser.parse_args()
    main(args.max_clients, args.max_conns)

In a new terminal window, start the client3.py and tell it to create 300 simultaneous connections to the server:

$ python client3.py --max-clients=300

Soon enough your server will explode. Here is a screenshot of the exception on my box:

The lesson is clear – your server should close duplicate descriptors. But even if you close duplicate descriptors, you are not out of the woods yet because there is another problem with your server, and that problem is zombies!

Yes, your server code actually creates zombies. Let’s see how. Start up your server again:

$ python webserver3d.py

Run the following curl command in another terminal window:

$ curl http://localhost:8888/hello

And now run the ps command to show running Python processes. This the example of psoutput on my Ubuntu box:

$ ps auxw | grep -i python | grep -v grep
vagrant   9099  0.0  1.2  31804  6256 pts/0    S+   16:33   0:00 python webserver3d.py
vagrant   9102  0.0  0.0      0     0 pts/0    Z+   16:33   0:00 [python] <defunct>

Do you see the second line above where it says the status of the process with PID 9102 is Z+and the name of the process is <defunct>? That’s our zombie there. The problem with zombies is that you can’t kill them.

Even if you try to kill zombies with $ kill -9 , they will survive. Try it and see for yourself.

What is a zombie anyway and why does our server create them? A zombie is a process that has terminated, but its parent has not waited for it and has not received its termination status yet. When a child process exits before its parent, the kernel turns the child process into a zombie and stores some information about the process for its parent process to retrieve later. The information stored is usually the process ID, the process termination status, and the resource usage by the process. Okay, so zombies serve a purpose, but if your server doesn’t take care of these zombies your system will get clogged up. Let’s see how that happens. First stop your running server and, in a new terminal window, use the ulimit command to set the max user processess to 400(make sure to set open files to a high number, let’s say 500 too):

$ ulimit -u 400
$ ulimit -n 500

Start the server webserver3d.py in the same terminal where you’ve just run the $ ulimit -u 400 command:

$ python webserver3d.py

In a new terminal window, start the client3.py and tell it to create 500 simultaneous connections to the server:

$ python client3.py --max-clients=500

And, again, soon enough your server will blow up with an OSError: Resource temporarily unavailable exception when it tries to create a new child process, but it can’t because it has reached the limit for the maximum number of child processes it’s allowed to create. Here is a screenshot of the exception on my box:

As you can see, zombies create problems for your long-running server if it doesn’t take care of them. I will discuss shortly how the server should deal with that zombie problem.

Let’s recap the main points you’ve covered so far:

  • If you don’t close duplicate descriptors, the clients won’t terminate because the client connections won’t get closed.
  • If you don’t close duplicate descriptors, your long-running server will eventually run out of available file descriptors (max open files).
  • When you fork a child process and it exits and the parent process doesn’t wait for it and doesn’t collect its termination status, it becomes a zombie.
  • Zombies need to eat something and, in our case, it’s memory. Your server will eventually run out of available processes (max user processes) if it doesn’t take care of zombies.
  • You can’t kill a zombie, you need to wait for it.

So what do you need to do to take care of zombies? You need to modify your server code towait for zombies to get their termination status. You can do that by modifying your server to call a wait system call. Unfortunately, that’s far from ideal because if you call wait and there is no terminated child process the call to wait will block your server, effectively preventing your server from handling new client connection requests. Are there any other options? Yes, there are, and one of them is the combination of a signal handler with the wait system call.

Here is how it works. When a child process exits, the kernel sends a SIGCHLD signal. The parent process can set up a signal handler to be asynchronously notified of that SIGCHLD event and then it can wait for the child to collect its termination status, thus preventing the zombie process from being left around.

By the way, an asynchronous event means that the parent process doesn’t know ahead of time that the event is going to happen.

Modify your server code to set up a SIGCHLD event handler and wait for a terminated child in the event handler. The code is available in webserver3e.py file:

###########################################################################
# Concurrent server - webserver3e.py                                      #
#                                                                         #
# Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X        #
###########################################################################
import os
import signal
import socket
import time

SERVER_ADDRESS = (HOST, PORT) = '', 8888
REQUEST_QUEUE_SIZE = 5


def grim_reaper(signum, frame):
    pid, status = os.wait()
    print(
        'Child {pid} terminated with status {status}'
        '\n'.format(pid=pid, status=status)
    )


def handle_request(client_connection):
    request = client_connection.recv(1024)
    print(request.decode())
    http_response = b"""\
HTTP/1.1 200 OK

Hello, World!
"""
    client_connection.sendall(http_response)
    # sleep to allow the parent to loop over to 'accept' and block there
    time.sleep(3)


def serve_forever():
    listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    listen_socket.bind(SERVER_ADDRESS)
    listen_socket.listen(REQUEST_QUEUE_SIZE)
    print('Serving HTTP on port {port} ...'.format(port=PORT))

    signal.signal(signal.SIGCHLD, grim_reaper)

    while True:
        client_connection, client_address = listen_socket.accept()
        pid = os.fork()
        if pid == 0:  # child
            listen_socket.close()  # close child copy
            handle_request(client_connection)
            client_connection.close()
            os._exit(0)
        else:  # parent
            client_connection.close()

if __name__ == '__main__':
    serve_forever()

Start the server:

$ python webserver3e.py

Use your old friend curl to send a request to the modified concurrent server:

$ curl http://localhost:8888/hello

Look at the server:

What just happened? The call to accept failed with the error EINTR.

The parent process was blocked in accept call when the child process exited which causedSIGCHLD event, which in turn activated the signal handler and when the signal handler finished the accept system call got interrupted:

Don’t worry, it’s a pretty simple problem to solve, though. All you need to do is to re-start theaccept system call. Here is the modified version of the server webserver3f.py that handles that problem:

###########################################################################
# Concurrent server - webserver3f.py                                      #
#                                                                         #
# Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X        #
###########################################################################
import errno
import os
import signal
import socket

SERVER_ADDRESS = (HOST, PORT) = '', 8888
REQUEST_QUEUE_SIZE = 1024


def grim_reaper(signum, frame):
    pid, status = os.wait()


def handle_request(client_connection):
    request = client_connection.recv(1024)
    print(request.decode())
    http_response = b"""\
HTTP/1.1 200 OK

Hello, World!
"""
    client_connection.sendall(http_response)


def serve_forever():
    listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    listen_socket.bind(SERVER_ADDRESS)
    listen_socket.listen(REQUEST_QUEUE_SIZE)
    print('Serving HTTP on port {port} ...'.format(port=PORT))

    signal.signal(signal.SIGCHLD, grim_reaper)

    while True:
        try:
            client_connection, client_address = listen_socket.accept()
        except IOError as e:
            code, msg = e.args
            # restart 'accept' if it was interrupted
            if code == errno.EINTR:
                continue
            else:
                raise

        pid = os.fork()
        if pid == 0:  # child
            listen_socket.close()  # close child copy
            handle_request(client_connection)
            client_connection.close()
            os._exit(0)
        else:  # parent
            client_connection.close()  # close parent copy and loop over


if __name__ == '__main__':
    serve_forever()

Start the updated server webserver3f.py:

$ python webserver3f.py

Use curl to send a request to the modified concurrent server:

$ curl http://localhost:8888/hello

See? No EINTR exceptions any more. Now, verify that there are no more zombies either and that your SIGCHLD event handler with wait call took care of terminated children. To do that, just run the ps command and see for yourself that there are no more Python processes with Z+status (no more <defunct> processes). Great! It feels safe without zombies running around.

  • If you fork a child and don’t wait for it, it becomes a zombie.
  • Use the SIGCHLD event handler to asynchronously wait for a terminated child to get its termination status
  • When using an event handler you need to keep in mind that system calls might get interrupted and you need to be prepared for that scenario

Okay, so far so good. No problems, right? Well, almost. Try your webserver3f.py again, but instead of making one request with curl use client3.py to create 128 simultaneous connections:

$ python client3.py --max-clients 128

Now run the ps command again

$ ps auxw | grep -i python | grep -v grep

and see that, oh boy, zombies are back again!

What went wrong this time? When you ran 128 simultaneous clients and established 128 connections, the child processes on the server handled the requests and exited almost at the same time causing a flood of SIGCHLD signals being sent to the parent process. The problem is that the signals are not queued and your server process missed several signals, which left several zombies running around unattended:

The solution to the problem is to set up a SIGCHLD event handler but instead of wait use awaitpid system call with a WNOHANG option in a loop to make sure that all terminated child processes are taken care of. Here is the modified server code, webserver3g.py:

###########################################################################
# Concurrent server - webserver3g.py                                      #
#                                                                         #
# Tested with Python 2.7.9 & Python 3.4 on Ubuntu 14.04 & Mac OS X        #
###########################################################################
import errno
import os
import signal
import socket

SERVER_ADDRESS = (HOST, PORT) = '', 8888
REQUEST_QUEUE_SIZE = 1024


def grim_reaper(signum, frame):
    while True:
        try:
            pid, status = os.waitpid(
                -1,          # Wait for any child process
                 os.WNOHANG  # Do not block and return EWOULDBLOCK error
            )
        except OSError:
            return

        if pid == 0:  # no more zombies
            return


def handle_request(client_connection):
    request = client_connection.recv(1024)
    print(request.decode())
    http_response = b"""\
HTTP/1.1 200 OK

Hello, World!
"""
    client_connection.sendall(http_response)


def serve_forever():
    listen_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    listen_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
    listen_socket.bind(SERVER_ADDRESS)
    listen_socket.listen(REQUEST_QUEUE_SIZE)
    print('Serving HTTP on port {port} ...'.format(port=PORT))

    signal.signal(signal.SIGCHLD, grim_reaper)

    while True:
        try:
            client_connection, client_address = listen_socket.accept()
        except IOError as e:
            code, msg = e.args
            # restart 'accept' if it was interrupted
            if code == errno.EINTR:
                continue
            else:
                raise

        pid = os.fork()
        if pid == 0:  # child
            listen_socket.close()  # close child copy
            handle_request(client_connection)
            client_connection.close()
            os._exit(0)
        else:  # parent
            client_connection.close()  # close parent copy and loop over

if __name__ == '__main__':
    serve_forever()

Start the server:

$ python webserver3g.py

Use the test client client3.py:

$ python client3.py --max-clients 128

And now verify that there are no more zombies. Yay! Life is good without zombies :)

Congratulations! It’s been a pretty long journey but I hope you liked it. Now you have your own simple concurrent server and the code can serve as a foundation for your further work towards a production grade Web server.

I’ll leave it as an exercise for you to update the WSGI server from Part 2 and make it concurrent. You can find the modified version here. But look at my code only after you’ve implemented your own version. You have all the necessary information to do that. So go and just do it :)

What’s next? As Josh Billings said,

Be like a postage stamp — stick to one thing until you get there.”

Start mastering the basics. Question what you already know. And always dig deeper.

If you learn only methods, you’ll be tied to your methods. But if you learn principles, you can devise your own methods.” —Ralph Waldo Emerson

Below is a list of books that I’ve drawn on for most of the material in this article. They will help you broaden and deepen your knowledge about the topics I’ve covered. I highly recommend you to get those books somehow: borrow them from your friends, check them out from your local library, or just buy them on Amazon. They are the keepers:

  1. Unix Network Programming, Volume 1: The Sockets Networking API (3rd Edition)

  2. Advanced Programming in the UNIX Environment, 3rd Edition

  3. The Linux Programming Interface: A Linux and UNIX System Programming Handbook

  4. TCP/IP Illustrated, Volume 1: The Protocols (2nd Edition) (Addison-Wesley Professional Computing Series)

  5. The Little Book of SEMAPHORES (2nd Edition): The Ins and Outs of Concurrency Control and Common Mistakes. Also available for free on the author’s site here.

BTW, I’m writing a book “Let’s Build A Web Server: First Steps” that explains how to write a basic web server from scratch and goes into more detail on the topics I just covered. Subscribe to the mailing list to get the latest updates about the book and the release date.

Categories: life is fun

PARALLELISING PYTHON WITH THREADING AND MULTIPROCESSING


Summary: Do not use “multithread” in Python. Use “multiprocessing” instead to leverage all CPU cores. 

 

One aspect of coding in Python that we have yet to discuss in any great detail is how to optimise the execution performance of our simulations. While NumPy, SciPy and pandas are extremely useful in this regard when considering vectorised code, we aren’t able to use these tools effectively when building event-driven systems. Are there any other means available to us to speed up our code? The answer is yes – but with caveats!

In this article we are going to look at the different models of parallelism that can be introduced into our Python programs. These models work particularly well for simulations that do not need to share state. Monte Carlo simulations used for options pricing and backtesting simulations of various parameters for algorithmic trading fall into this category.

In particular we are going to consider the Threading library and the Multiprocessing library.

Concurrency in Python

One of the most frequently asked questions from beginning Python programmers when they explore multithreaded code for optimisation of CPU-bound code is “Why does my program run slower when I use multiple threads?”.

The expectation is that on a multi-core machine a multithreaded code should make use of these extra cores and thus increase overall performance. Unfortunately the internals of the main Python interpreter, CPython, negate the possibility of true multi-threading due to a process known as the Global Interpreter Lock (GIL).

The GIL is necessary because the Python interpreter is not thread safe. This means that there is a globally enforced lock when trying to safely access Python objects from within threads. At any one time only a single thread can acquire a lock for a Python object or C API. The interpreter will reacquire this lock for every 100 bytecodes of Python instructions and around (potentially) blocking I/O operations. Because of this lock CPU-bound code will see no gain in performance when using the Threading library, but it will likely gain performance increases if the Multiprocessing library is used.

Parallelisation Libraries Implementation

We are now going to utilise the above two separate libraries to attempt a parallel optimisation of a “toy” problem.

Threading Library

Above we alluded to the fact that Python on the CPython interpreter does not support true multi-core execution via multithreading. However, Python DOES have a Threading library. So what is the benefit of using the library if we (supposedly) cannot make use of multiple cores?

Many programs, particularly those relating to network programming or data input/output (I/O) are often network-bound or I/O bound. This means that the Python interpreter is awaiting the result of a function call that is manipulating data from a “remote” source such as a network address or hard disk. Such access is far slower than reading from local memory or a CPU-cache.

Hence, one means of speeding up such code if many data sources are being accessed is to generate a thread for each data item needing to be accessed.

For example, consider a Python code that is scraping many web URLs. Given that each URL will have an associated download time well in excess of the CPU processing capability of the computer, a single-threaded implementation will be significantly I/O bound.

By adding a new thread for each download resource, the code can download multiple data sources in parallel and combine the results at the end of every download. This means that each subsequent download is not waiting on the download of earlier web pages. In this case the program is now bound by the bandwidth limitations of the client/server(s) instead.

However, many financial applications ARE CPU-bound since they are highly numerically intensive. They often involve large-scale numerical linear algebra solutions or random statistical draws, such as in Monte Carlo simulations. Thus as far as Python and the GIL are concerned, there is no benefit to using the Python Threading library for such tasks.

Python Implementation

The following code illustrates a multithreaded implementation for a “toy” code that sequentially adds numbers to lists. Each thread creates a new list and adds random numbers to it. This has been chosen as a toy example since it is CPU heavy.

The following code will outline the interface for the Threading library but it will not grant us any additional speedup beyond that obtainable in a single-threaded implementation. When we come to use the Multiprocessing library below, we will see that it will significantly decrease the overall runtime.

Let’s examine how the code works. Firstly we import the threading library. Then we create a function list_append that takes three parameters. The first, count, determines the size of the list to create. The second, id, is the ID of the “job” (which can be useful if we are writing debug info to the console). The third parameter, out_list, is the list to append the random numbers to.

The __main__ function creates a size of 107107 and uses two threads to carry out the work. It then creates a jobs list, which is used to store the separate threads. Thethreading.Thread object takes the list_append function as a parameter and then appends it to the jobs list.

Finally, the jobs are sequentially started and then sequentially “joined”. The join()method blocks the calling thread (i.e. the main Python interpreter thread) until the thread has terminated. This ensures that all of the threads are complete before printing the completion message to the console:

# thread_test.py

import random
import threading


def list_append(count, id, out_list):
	"""
	Creates an empty list and then appends a 
	random number to the list 'count' number
	of times. A CPU-heavy operation!
	"""
	for i in range(count):
		out_list.append(random.random())

if __name__ == "__main__":
	size = 10000000   # Number of random numbers to add
	threads = 2   # Number of threads to create

	# Create a list of jobs and then iterate through
	# the number of threads appending each thread to
	# the job list 
	jobs = []
	for i in range(0, threads):
		out_list = list()
		thread = threading.Thread(target=list_append(size, i, out_list))
		jobs.append(thread)

	# Start the threads (i.e. calculate the random number lists)
	for j in jobs:
		j.start()

	# Ensure all of the threads have finished
	for j in jobs:
		j.join()

	print "List processing complete."

We can time this code using the following console call:

time python thread_test.py

It produces the following output:

List processing complete.

real    0m2.003s
user    0m1.838s
sys     0m0.161s

Notice that the user and sys both approximately sum to the real time. This is indicative that we gained no benefit from using the Threading library. If we had then we would expect the real time to be significantly less. These concepts within concurrent programming are usually known as CPU-time and wall-clock time respectively.

Multiprocessing Library

In order to actually make use of the extra cores present in nearly all modern consumer processors we can instead use the Multiprocessing library. This works in a fundamentally different way to the Threading library, even though the syntax of the two is extremely similar.

The Multiprocessing library actually spawns multiple operating system processes for each parallel task. This nicely side-steps the GIL, by giving each process its own Python interpreter and thus own GIL. Hence each process can be fed to a separate processor core and then regrouped at the end once all processes have finished.

There are some drawbacks, however. Spawning extra processes introduces I/O overhead as data is having to be shuffled around between processors. This can add to the overall run-time. However, assuming the data is restricted to each process, it is possible to gain significant speedup. Of course, one must always be aware of Amdahl’s Law!

Python Implementation

The only modifications needed for the Multiprocessing implementation include changing the import line and the functional form of the multiprocessing.Process line. In this case the arguments to the target function are passed separately. Beyond that the code is almost identical to the Threading implementation above:

# multiproc_test.py

import random
import multiprocessing


def list_append(count, id, out_list):
	"""
	Creates an empty list and then appends a 
	random number to the list 'count' number
	of times. A CPU-heavy operation!
	"""
	for i in range(count):
		out_list.append(random.random())

if __name__ == "__main__":
	size = 10000000   # Number of random numbers to add
	procs = 2   # Number of processes to create

	# Create a list of jobs and then iterate through
	# the number of processes appending each process to
	# the job list 
	jobs = []
	for i in range(0, procs):
		out_list = list()
		process = multiprocessing.Process(target=list_append, 
			                              args=(size, i, out_list))
		jobs.append(process)

	# Start the processes (i.e. calculate the random number lists)		
	for j in jobs:
		j.start()

	# Ensure all of the processes have finished
	for j in jobs:
		j.join()

	print "List processing complete."

We can once again time this code using a similar console call:

time python multiproc_test.py

We receive the following output:

List processing complete.

real    0m1.045s
user    0m1.824s
sys     0m0.231s

In this case you can see that while the user and sys times have reamined approximately the same, the real time has dropped by a factor of almost two. This makes sense since we’re using two processes. Scaling to four processes while halving the list size for comparison gives the following output (under the assumption that you have at least four cores!):

List processing complete.

real    0m0.540s
user    0m1.792s
sys     0m0.269s

This is an approximate 3.8x speed-up with four processes. However, we must be careful of generalising this to larger, more complex programs. Data transfer, hardware cache-levels and other issues will almost certainly reduce this sort of performance gain in “real” codes.

In later articles we will be modifying the Event-Driven Backtester to use parallel techniques in order to improve the ability to carry out multi-dimensional parameter optimisation studies.

Categories: life is fun

How to install Caffe on Mac (OS X Yosemite 10.10.4)

February 18, 2016 Leave a comment

How to install Caffe on Mac (OS X Yosemite 10.10.4)

After famous Google Research group post about deep dream, they have released ipynb notebook to mess around with dream generation (available on github).

It requires

And there are a bunch of problems with installing caffe framework on Mac.

There is an official manual for installation with the specific instructions about OS X part.

In the general case you should be able to install Caffe on Mac without any problems. But in case you experience any, you can follow this guide.

Caffe requirements:

  • CUDA is required for GPU mode.
    This is the greatest problem, since CPU only version builds out-of-the-box. You need to download it from the official site and make sure you download version 7.0+. Also you may want to apply for cuDNN — deep learning framework, which may also speed up picture generation later. I’ve applied, but they are still reviewing my application, so I built caffe without cuDNN support so far.
  • BLAS via ATLAS, MKL, or OpenBLAS.
    Official Mac guide suggests that you might have BLAS installed, but I didn’t have it, so I used brew to get openblas (which also provides a speedup compared to BLAS)
  • Boost >= 1.55
  • OpenCV >= 2.4 including 3.0
    Downloading and building OpenCV from the official site was my primary mistake. Since I was building it with the default compiler, not the libstdc++, which works with CUDA
  • protobufgloggflags
  • IO libraries hdf5leveldbsnappylmdb

So, I uninstalled OpenBLAS and OpenCV, I built on my machine and made sure that I don’t have any out of date brew packages installed.

All these steps of removing previously built libraries and setting up all the dependancies are covered in the section «Errors I encountered during the build», in the mean time the following section should cover the steps required to build Caffe on Mac.

If you encounter any errors, feel free to ask google or consult supplementary links.

How to build Caffe on Mac from scratch

At first we have to edit Makefile.config. My config (lines which differ from original):

# CUDA directory contains bin/ and lib/ directories that we need.
CUDA_DIR := /usr/local/cuda
# BLAS choice:
# atlas for ATLAS (default)
# mkl for MKL
# open for OpenBlas
BLAS := open
# Custom (MKL/ATLAS/OpenBLAS) include and lib directories.
# Leave commented to accept the defaults for your choice of BLAS
# (which should work)!
# BLAS_INCLUDE := /path/to/your/blas
# BLAS_LIB := /path/to/your/blas

# Homebrew puts openblas in a directory that is not on the standard search path
#BLAS_INCLUDE := $(shell brew —prefix openblas)/include
#BLAS_LIB := $(shell brew —prefix openblas)/lib
BLAS_INCLUDE := /usr/local/Cellar/openblas/0.2.14_1/include
BLAS_LIB := /usr/local/Cellar/openblas/0.2.14_1/lib

# This is required only if you will compile the matlab interface.
# MATLAB directory should contain the mex binary in /bin.
# MATLAB_DIR := /usr/local
# MATLAB_DIR := /Applications/MATLAB_R2012b.app

# NOTE: this is required only if you will compile the python interface.
# We need to be able to find Python.h and numpy/arrayobject.h.
PYTHON_INCLUDE := /usr/include/python2.7 \
/usr/lib/python2.7/dist-packages/numpy/core/include
# Anaconda Python distribution is quite popular. Include path:
# Verify anaconda location, sometimes it’s in root.
ANACONDA_HOME := $(HOME)/anaconda
PYTHON_INCLUDE := $(ANACONDA_HOME)/include \
$(ANACONDA_HOME)/include/python2.7 \
$(ANACONDA_HOME)/lib/python2.7/site-packages/numpy/core/include \

# We need to be able to find libpythonX.X.so or .dylib.
PYTHON_LIB := /usr/lib
PYTHON_LIB := $(ANACONDA_HOME)/lib
# Whatever else you find you need goes here.
INCLUDE_DIRS := $(PYTHON_INCLUDE) /usr/local/include
LIBRARY_DIRS := $(PYTHON_LIB) /usr/local/lib /usr/lib

After editing Makefile.config you should be able to install Caffe:

brew update
brew tap homebrew/science
for x in snappy leveldb gflags glog szip hdf5 lmdb homebrew/science/opencv; do brew uninstall $x; brew install --fresh -vd $x; done
brew uninstall --force protobuf; brew install --with-python --fresh -vd protobuf
brew uninstall boost boost-python; brew install --fresh -vd boost boost-python
#we have to export this to make sure we include all the cuda and anaconda resources
export DYLD_FALLBACK_LIBRARY_PATH=/usr/local/cuda/lib:$HOME/anaconda/lib:/usr/local/lib:/usr/lib:$DYLD_FALLBACK_LIBRARY_PATH
export CPLUS_INCLUDE_PATH=$HOME/anaconda/include/python2.7/:
#using -j8 key to enable parallel building. Number after j is the number of the CPUs available, in my case it's 8
make all -j8
make test -j8
make runtest -j8
make pycaffe
#to run caffe we need to export path to the built caffe:
export PYTHONPATH=/Users/kupa/Desktop/caffe-master/python/:$PYTHONPATH

Errors I encountered during the build


$ rm -rf /usr/local/Cellar && brew prune

If you don’t want to remove all packets, you can at first try removing only those, which are required by caffe:

$ for x in snappy leveldb gflags glog szip hdf5 lmdb homebrew/science/opencv; do brew uninstall $x; brew install --fresh -vd $x; done

In my case after removing them, I had also to link a couple of packages:

$ rm /usr/local/bin/f2py
$ brew link numpy
$ brew install opencv
$ rm -r /usr/local/include/opencv2/
$ rm -r /usr/local/share/OpenCV/
$ brew link opencv

Though, build failed:

$ make all -j8
****
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: ** [.build_release/lib/libcaffe.so] Error 1

By the way, before deleting everything I was getting annoying ‘-pthread’ error:

$ make all -j8
*****
clang: warning: argument unused during compilation: '-pthread'
ld: warning: directory not found for option '-L/opt/intel/mkl/lib/intel64'
Undefined symbols for architecture x86_64:
"leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB*)", referenced from:
caffe::db::LevelDB::Open(std::string const&, caffe::db::Mode) in db.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: ** [.build_release/lib/libcaffe.so] Error 1

That was due to using wrong protobuf version. I reinstalled protobuf and boost (assuggested here):

$ brew uninstall --force protobuf; brew install --with-python --fresh -vd protobuf
$ brew uninstall boost boost-python; brew install --fresh -vd boost boost-python

This also helps if you have following error:

$ make all -j8
PROTOC src/caffe/proto/caffe.proto
make: protoc: No such file or directory
make: *** [.build_release/src/caffe/proto/caffe.pb.cc] Error 1
make: *** Waiting for unfinished jobs...

There is a possibility that you encounter this output:

./include/caffe/util/mkl_alternate.hpp:11:10: fatal error: 'cblas.h' file not found
#include
^
1 error generated.
make: *** [.build_release/src/caffe/layer_factory.o] Error 1

Which can be dealt with:

$ brew uninstall openblas; brew install --fresh -vd openblas
...
This formula is keg-only, which means it was not symlinked into /usr/local.

OS X already provides this software and installing another version in
parallel can cause all kinds of trouble.

Generally there are no consequences of this for you. If you build your
own software and it requires this formula, you’ll need to add to your
build variables:

LDFLAGS: -L/usr/local/opt/openblas/lib
CPPFLAGS: -I/usr/local/opt/openblas/include

==> Finishing up
Changing dylib ID of /usr/local/Cellar/openblas/0.2.14_1/lib/libopenblasp-r0.2.14.dylib
from @@HOMEBREW_PREFIX@@/opt/openblas/lib/libopenblasp-r0.2.14.dylib
to /usr/local/opt/openblas/lib/libopenblasp-r0.2.14.dylib
Changing install name in /usr/local/Cellar/openblas/0.2.14_1/lib/libopenblasp-r0.2.14.dylib
from @@HOMEBREW_PREFIX@@/lib/gcc/5/libgfortran.3.dylib
to /usr/local/lib/gcc/5/libgfortran.3.dylib
Changing install name in /usr/local/Cellar/openblas/0.2.14_1/lib/libopenblasp-r0.2.14.dylib
from @@HOMEBREW_PREFIX@@/lib/gcc/5/libquadmath.0.dylib
to /usr/local/lib/gcc/5/libquadmath.0.dylib
Changing install name in /usr/local/Cellar/openblas/0.2.14_1/lib/libopenblasp-r0.2.14.dylib
from @@HOMEBREW_PREFIX@@/lib/gcc/5/libgcc_s.1.dylib
to /usr/local/lib/gcc/5/libgcc_s.1.dylib
Changing dylib ID of /usr/local/Cellar/openblas/0.2.14_1/lib/libopenblasp-r0.2.14.dylib
from /usr/local/opt/openblas/lib/libopenblasp-r0.2.14.dylib
to /usr/local/opt/openblas/lib/libopenblasp-r0.2.14.dylib

I gave the output for installing openblas because it might be needed to alter Makefile.config or export global vars:

$ export LDFLAGS=/usr/local/opt/openblas/lib:$LDFLAGS
$ export CPPFLAGS=/usr/local/opt/openblas/include:$CPPFLAGS

Then I encountede an error:

$ make all -j8
CXX src/caffe/blob.cpp
In file included from src/caffe/blob.cpp:4:
In file included from ./include/caffe/blob.hpp:8:
./include/caffe/common.hpp:6:10: fatal error: 'glog/logging.h' file not found

Which resulted in installing glog

$ rm -r /usr/local/lib/cmake
$ brew link gflags
$ brew install glog

Then there was an Error limit:

$ brew install glog
*****

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__functional_03(218): error: unary_function is not a template

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__functional_03(218): error: not a class or struct name

 Error limit reached.
 100 errors detected in the compilation of "/var/folders/ct/3_xtyk5n427_pvx3bfdf5jdm0000gn/T//tmpxft_00002db4_00000000-16_absval_layer.compute_50.cpp1.ii".
 Compilation terminated.
 make: *** [.build_release/cuda/src/caffe/layers/absval_layer.o] Error 1
 
 Which was fixed according to the found solution:
 
 $ cd Library/Formula/
 $ cp boost.rb boost_backup.rb
 $ cp boost-python.rb boost-python_backup.rb
 $ wget https://raw.githubusercontent.com/Homebrew/homebrew/6fd6a9b6b2f56139a44dd689d30b7168ac13effb/Library/Formula/boost.rb
 $ mv boost.rb.1 boost.rb
 $ wget https://raw.githubusercontent.com/Homebrew/homebrew/3141234b3473717e87f3958d4916fe0ada0baba9/Library/Formula/boost-python.rb
 $ mv boost-python.rb.1 boost-python.rb
 $ brew uninstall --force boost
 $ brew install boost
 

After that I was actually able to build a caffe with:


$ make clean
$ make all -j8
$ make test -j8

But I forgot to export the fallback paths and got error:

$ make runtest
.build_release/tools/caffe
dyld: Library not loaded: @rpath/libcudart.7.0.dylib
Referenced from: /Users/kupa/Desktop/caffe-master/.build_release/tools/caffe
Reason: image not found
make: *** [runtest] Trace/BPT trap: 5

and got fixed it according to the original manual:

$ export DYLD_FALLBACK_LIBRARY_PATH=/usr/local/cuda/lib:/Users/kupa/anaconda/lib:/usr/local/lib:/usr/lib:/usr/local/Cellar/hdf5/:/usr/local/Cellar/:$DYLD_FALLBACK_LIBRARY_PATH

Then, if you followed the guide you should be able to run $make runtest -j8

$ make runtest -j8
*********
[----------] Global test environment tear-down
[==========] 1356 tests from 214 test cases ran. (260250 ms total)
[ PASSED ] 1356 tests.

YOU HAVE 2 DISABLED TESTS

$ make pycaffe
CXX/LD -o python/caffe/_caffe.so python/caffe/_caffe.cpp
touch python/caffe/proto/__init__.py
PROTOC (python) src/caffe/proto/caffe.proto

Now, all you left to do is to export correct python path and run $ python -c «import caffe» to test if everything is ok:

$ export PYTHONPATH=$(installed dir)/caffe-master/python/

So now, if you’ve done everything right you should be able to proceed with testing to the google/deepdream github repo.

Useful links

  1. Official caffe manual
  2. Caffe OS X installation manual
  3. Official CUDA download page
  4. Great post about installing caffe
  5. Caffe incompatible with Boost 1.58
  6. Another installing report
  7. One more post about successful experience
  8. Error importing caffe
  9. Error dyld: Library not loaded: libhdf5_hl.7.dylib on github. And one more.
  10. Post which didn’t work for me (but might for you)
  11. Protobuf error
  12. ‘-pthread’ error.
  13. More about protobuf error
  14. Great article about installing caffe on mac.
Categories: life is fun

What Questions to ask a VC/PE

January 17, 2016 Leave a comment

I was fortunate enough to have a fairy tale startup–we bootstrapped, raised $5M, grew to be a market leader, and had a great exit. Since then I’ve played all the other roles around the table in other companies’ stories–some fairy tales, some tragedies–as an angel, VC, seller, acquirer, advisor, and Board member.

What I’ve found is that those seeking capital usually don’t understand the motivations and limitations of those providing capital. When I was running my business, I certainly didn’t.

This lack of understanding makes it harder than it has to be on the entrepreneur. Running your business, figuring out a growth strategy, getting an investor to believe in your story–these things are hard enough. Divining the opaque rationale behind decisions that investors make is the burden I’m trying to alleviate here.

Find a structural fit, not a strategic fit.

Since equity is the most high profile of capital options, in Part 1 of this series I’m going to concentrate on “The First Questions You Should Ask a VC/PE.”

These questions are not intended to address strategic fit, such as whether the investor has market knowledge of your industry, or whether the partners are value-add board members. The focus here is finding the right structural fit–whether the investors have the ability and desire to put cash into companies like yours right now.

All things being equal, I always found that structural fit was a higher priority over strategic fit. My company was a great strategic fit for lots of investors, but it wasn’t a structural fit for most. So I got nowhere. In the end, I raised money from a group that was not a strategic fit, but was a terrific structural fit.

Go get the cash.

In the end, strategic fit is neither necessary nor sufficient to raise capital, but structural fit is. So, onto the questions.

Your question: “When did you close your current fund?”

Their answer: “We closed our fund within the last 24 months.”

What it means: “We have money burning a hole in our pockets and need to invest in as many businesses as we can right now.”

VC/PE firms are General Partners in 10 year “closed-end” legal Limited Partnerships. That means the fund is contractually bound to invest and divest its investments and return money to its Limited Partner (“LP”) investors within 10 years of when it is created (investments can and do go longer–frequently—but it’s not the goal).

Firms want to put their money into companies within 3 years so they have time to mature and get to exit, because average hold times are approximately 7 years. If their last fund closed four to five or more years ago, they still have money in their fund, but it’s almost all reserved for their existing investments.

Timing is everything.

All this means that your company could be a perfect strategic fit that the firm loves, but they can’t invest because it’s not a structural fit for the fund due to timing–because they’ve already placed their bets. There are two exceptions to this:

  • If the firm has had a lot of big exits in the first few years (this is statistically rare), they’ll be able to recycle the returns into new investments like yours and still have time to exit within the 10-year time frame.
  • If your company has a direct and high probability path to exit in a very short time period

Probability-wise, though, your chance for greatest success is to concentrate your efforts on VCs/PEs that are in the honeymoon phase of their latest fund (within 3 years of latest close). They’re ready, willing, and able to invest–indeed, they have to get the cash out as soon as possible because they have a “burning platform” of the 10-year total time window.

A firm’s initial investments define its style.

Your question: “What size is your average total investment?”

Their answer: “We typically invest $2M at first and reserve $4-6M total for each investment.”

The answer tells you a lot about the firm’s style–early stage, late stage, growth, buyout, etc. For example, you can have two identical size funds, but a larger initial investment means they are looking for either larger companies or to control positions in smaller companies.

Importantly, this also means they will have fewer investments to get to their target return–that means they want less risk (more mature companies means less chance of an investment going to zero) and have a narrower target return window (they don’t need investments to be a Google-type exit to offset a lot of zeros). This makes sense because when fewer companies fail, the remaining ones don’t have to be intergalactic home runs for them to get the fund to its target returns. (The next section will elucidate why this is important.)

Your relationship with risk should guide your search.

PE firms focused on more mature businesses average a capital loss ratio (how much of their investment capital goes to zero) of about 15 percent, whereas early stage VCs average about 35 percent. That means those PE firms take on much less risk than the typical VC, and that means they are looking for more experienced management, and companies with more revenue and profits, diverse and large customer bases, and a leading market position.

Knowing this is important because the VC/PE firm’s style has to suit who you are and what you want. Let’s say you’ve been slaving away to get your business to $10M in revenue. You see big opportunities but you’re sick of the grind and want some cash now while someone else sweats the pressure of growth and competition.

You need a firm whose strategy will allow a control recap (give you some cash for a majority of your equity) and growth capital (new cash on top of the cash that goes to you) and is comfortable finding new management. This is typically a firm with a larger fund (>$200M) that makes larger first investments (>$5M).

You might find a VC/PE firm that’s a great strategic fit (that knows your product type and market), but if they make smaller investments, they are looking for minority equity in a hungry entrepreneur who wants to lead the company to world domination. Your situation won’t be a structural fit and therefore they would be unlikely to invest.

Understand their funding from their point of view.

Start by asking the following:

Your question: “How big is your current fund?”

Their answer: “Our current fund is $100M.”

What it means: “We need to return $300M to our Limited Partners.”

VC and PE behavior is driven by aggregated returns to their LPs. A typical target return for a fund is 3x cash-on-cash return and >20% IRR (net of fees) to LPs over 10 to 12 years. With such returns they’ll be a top performer in most years and will be able to raise another fund–which is most professional investors’ ultimate long-term goal.

This return target is a pretty universal rule for all equity investors, whether a $25M or $500M fund, a VC or PE, or early or late stage specialists. An individual firm’s focus may vary–which determines strategic fit–but all firms need to rattle their cups to get money from big pools of money (ie LPs), and managers of those pools are similar in looking for such returns.

Here’s why this matters: Fund size determines everything in a deal that matters to an entrepreneur.

Do the math.

Think of it this way: If I’m a $100M, early stage VC fund, about $20M goes to management fees (10-year fund, 1.5-2 percent management fee per year, plus deal fees, plus dead deal costs, etc), and about $30M goes to zero (average VC capital loss ratio is 35 percent). That means only $50M will actively make returns. On average, about $25M will just return the original investment, leaving $25M of investment bets to make $300M.

A typical such fund averages 15 bets, starting off with a $2M investment per company and reserving an average of $5M (which not every company uses). Based on the average numbers, 5 companies will be a total loss, 5 will return the original investment, and 5 will be responsible for returning the lions share of the $300M.

That means each of those winners must return $50M+ a piece. That’s a 10x multiple of the total $5M investment in each company. We’ve all heard how VCs look for “10 baggers” and this math substantiates why. What entrepreneurs don’t know is that this drives the types of companies the VC/PE can invest in, and the price they can put on those companies (their valuation).

If they love your company, can it still work out? (No.)

Let’s say a VC loves a company–the industry, the product type, the stage, the entrepreneur. It’s a 100 percent strategic fit. They can’t make the investment unless they can structure the deal to potentially get them their target return of $50M+. How do they do this?

First of all, they need to believe the business can eventually sell for enough to make the return. For example, you have a $2M revenue software business, and your plan says it’ll grow in five years to $15M with a $2M profit. Excluding the fairy tale world of Instagram, a good price is 2-4x revenues, so let’s say 3x revenue, or $45M.

That means even if the VC owned 100 percent of your business, they can’t reach their target return of $50M+ (absent an Instagram lotto ticket). So even if a VC loves everything about your company, and objectively growing from $2M to $15M in five years is great, the investment is not a structural fit for the fund because of return potential.

Second, a VC/PE’s ownership percentage needs to support the return target. Let’s say your projections show explosive growth from $2M to $50M revenue and $10M EBITDA in five years. That’s an eventual projected sale price of $150M (at that size, EBITDA multiples are more common, and 15x EBITDA is a generous middle-of-the-road valuation). The VC needs to own one-third of your business at that sale price to get their $50M. If you want $5M, the maximum valuation you’ll get is likely $10M or less.

Here’s the take-home: Listen to the math.

Math around what the VC/PE believes will happen at sale–not your plan, your product, or you–is what drives valuation. And valuation is usually the hot button issues for entrepreneurs.

So if you know how fund math contributes to VC/PE investment decisions, you can more efficiently determine whether your company is a structural fit with a particular VC/PE.

These questions should help you ascertain the key characteristics of a VC/PE to see if they’re a structural fit for your company. In Part 2 of this series, I’m going to go over the questions you should ask yourself so that you know if, what type, and what amount of capital is appropriate.

Categories: life is fun

what the hell DO you do if you are unlucky enough to win the lottery?

January 13, 2016 Leave a comment

what the hell DO you do if you are unlucky enough to win the lottery?

This is the absolutely most important thing you can do right away: NOTHING.

Yes. Nothing.

DO NOT DECLARE YOURSELF THE WINNER yet.

Do NOT tell anyone. The urge is going to be nearly irresistible. Resist it. Trust me.

/ 1. IMMEDIATELY retain an attorney.

Get a partner from a larger, NATIONAL firm. Don’t let them pawn off junior partners or associates on you. They might try, all law firms might, but insist instead that your lead be a partner who has been with the firm for awhile. Do NOT use your local attorney. Yes, I mean your long-standing family attorney who did your mother’s will. Do not use the guy who fought your dry-cleaner bill. Do not use the guy you have trusted your entire life because of his long and faithful service to your family. In fact, do not use any firm that has any connection to family or friends or community. TRUST me. This is bad. You want someone who has never heard of you, any of your friends, or any member of your family. Go the the closest big city and walk into one of the national firms asking for one of the “Trust and Estates” partners you have previously looked up on http://www.martindale.com from one of the largest 50 firms in the United States which has an office near you. You can look up attornies by practice area and firm on Martindale.

/ 2. Decide to take the lump sum.

Most lotteries pay a really pathetic rate for the annuity. It usually hovers around 4.5% annual return or less, depending. It doesn’t take much to do better than this, and if you have the money already in cash, rather than leaving it in the hands of the state, you can pull from the capital whenever you like. If you take the annuity you won’t have access to that cash. That could be good. It could be bad. It’s probably bad unless you have a very addictive personality. If you need an allowance managed by the state, it is because you didn’t listen to point #1 above.

Why not let the state just handle it for you and give you your allowance?

Many state lotteries pay you your “allowence” (the annuity option) by buying U.S. treasury instruments and running the interest payments through their bureaucracy before sending it to you along with a hunk of the principal every month. You will not be beating inflation by much, if at all. There is no reason you couldn’t do this yourself, if a low single-digit return is acceptable to you.

You aren’t going to get even remotely the amount of the actual jackpot. Take our old friend Mr. Whittaker. Using Whittaker is a good model both because of the reminder of his ignominious decline, and the fact that his winning ticket was one of the larger ones on record. If his situation looks less than stellar to you, you might have a better perspective on how “large” your winnings aren’t. Whittaker’s “jackpot” was $315 million. He selected the lump-sum cash up-front option, which knocked off $145 million (or 46% of the total) leaving him with $170 million. That was then subject to withholding for taxes of $56 million (33%) leaving him with $114 million.

In general, you should expect to get about half of the original jackpot if you elect a lump sum (maybe better, it depends). After that, you should expect to lose around 33% of your already pruned figure to state and federal taxes. (Your mileage may vary, particularly if you live in a state with aggressive taxation schemes).

/ 3. Decide right now, how much you plan to give to family and friends.

This really shouldn’t be more than 20% or so. Figure it out right now. Pick your number. Tell your lawyer. That’s it. Don’t change it. 20% of $114 million is $22.8 million. That leaves you with $91.2 million. DO NOT CONSULT WITH FAMILY when deciding how much to give to family. You are going to get advice that is badly tainted by conflict of interest, and if other family members find out that Aunt Flo was consulted and they weren’t you will never hear the end of it. Neither will Aunt Flo. This might later form the basis for an allegation that Aunt Flo unduly influenced you and a lawsuit might magically appear on this basis. No, I’m not kidding. I know of one circumstance (related to a business windfall, not a lottery) where the plaintiffs WON this case.

Do NOT give anyone cash. Ever. Period. Just don’t. Do not buy them houses. Do not buy them cars. Tell your attorney that you want to provide for your family, and that you want to set up a series of trusts for them that will total 20% of your after tax winnings. Tell him you want the trust empowered to fund higher education, some help (not a total) purchase of their first home, some provision for weddings and the like, whatever. Do NOT put yourself in the position of handing out cash. Once you do, if you stop, you will be accused of being a heartless bastard (or bitch). Trust me. It won’t go well.

It will be easy to lose perspective. It is now the duty of your friends, family, relatives, hangers-on and their inner circle to skew your perspective, and they take this job quite seriously. Setting up a trust, a managed fund for your family that is in the double digit millions is AMAZINGLY generous. You need never have trouble sleeping because you didn’t lend Uncle Jerry $20,000 in small denomination unmarked bills to start his chain of deep-fried peanut butter pancake restaurants. (“Deep’n ‘nutter Restaurants”) Your attorney will have a number of good ideas how to parse this wealth out without turning your siblings/spouse/children/grandchildren/cousins/waitresses into the latest Paris Hilton.

 

 

/ 4. You will be encouraged to hire an investment manager. Considerable pressure will be applied. Don’t.

Investment managers charge fees, usually a percentage of assets. Consider this: If they charge 1% (which is low, I doubt you could find this deal, actually) they have to beat the market by 1% every year just to break even with a general market index fund. It is not worth it, and you don’t need the extra return or the extra risk. Go for the index fund instead if you must invest in stocks. This is a hard rule to follow. They will come recommended by friends. They will come recommended by family. They will be your second cousin on your mother’s side. Investment managers will sound smart. They will have lots of cool acronyms. They will have nice PowerPoint presentations. They might (MIGHT) pay for your shrimp cocktail lunch at TGI Friday’s while reminding you how poor their side of the family is. They live for this stuff.

You should smile, thank them for their time, and then tell them you will get back to them next week. Don’t sign ANYTHING. Don’t write it on a cocktail napkin (lottery lawsuit cases have been won and lost over drunkenly scrawled cocktail napkin addition and subtraction figures with lots of zeros on them). Never call them back. Trust me. You will thank me later. This tactic, smiling, thanking people for their time, and promising to get back to people, is going to have to become familiar. You will have to learn to say no gently, without saying the word “no.” It sounds underhanded. Sneaky. It is. And its part of your new survival strategy. I mean the word “survival” quite literally.

Get all this figured out BEFORE you claim your winnings. They aren’t going anywhere. Just relax.

/ 5. If you elect to be more global about your paranoia, use between 20.00% and 33.00% of what you have not decided to commit to a family fund IMMEDIATELY to purchase a combination of longer term U.S. treasuries (5 or 10 year are a good idea) and perhaps even another G7 treasury instrument. This is your safety net. You will be protected… from yourself.

You are going to be really tempted to starting being a big investor. You are going to be convinced that you can double your money in Vegas with your awesome Roulette system/by funding your friend’s amazing idea to sell Lemming dung/buying land for oil drilling/by shorting the North Pole Ice market (global warming, you know). This all sounds tempting because “Even if I lose it all I still have $XX million left! Anyone could live on that comfortably for the rest of their life.” Yeah, except for 33% of everyone who won the lottery.

You’re not going to double your money, so cool it. Let me say that again. You’re not going to double your money, so cool it. Right now, you’ll get around 3.5% on the 10 year U.S. treasury. With $18.2 million (20% of $91.2 mil after your absurdly generous family gift) invested in those you will pull down $638,400 per year. If everything else blows up, you still have that, and you will be in the top 1% of income in the United States. So how about you not fuck with it. Eh? And that’s income that is damn safe. If we get to the point where the United States defaults on those instruments, we are in far worse shape than worrying about money.

If you are really paranoid, you might consider picking another G7 or otherwise mainstream country other than the U.S. according to where you want to live if the United States dissolves into anarchy or Britney Spears is elected to the United States Senate. Put some fraction in something like Swiss Government Bonds at 3%. If the Swiss stop paying on their government debt, well, then you know money really means nothing anywhere on the globe anymore. I’d study small field sustainable agriculture if you think this is a possibility. You might have to start feedng yourself.

/ 6. That leaves, say, 80% of $91.2 million or $72.9 million.

Here is where things start to get less clear. Personally, I think you should dump half of this, or $36.4 million, into a boring S&P 500 index fund. Find something with low fees. You are going to be constantly tempted to retain “sophisticated” advisers who charge “nominal fees.” Don’t. Period. Even if you lose every other dime, you have $638,400 per year you didn’t have before that will keep coming in until the United States falls into chaos. Fuck advisers and their fees. Instead, drop your $36.4 million in the market in a low fee vehicle. Unless we have an unprecedented downturn the likes of which the United States has never seen, should return around 7.00% or so over the next 10 years. You should expect to touch not even a dime of this money for 10 or 15 or even 20 years. In 20 years $36.4 million could easily become $115 million.

/ 7. So you have put a safety net in place.

You have provided for your family beyond your wildest dreams. And you still have $36.4 million in “cash.” You know you will be getting $638,400 per year unless the capital building is burning, you don’t ever need to give anyone you care about cash, since they are provided for generously and responsibly (and can’t blow it in Vegas) and you have a HUGE nest egg that is growing at market rates. (Given the recent dip, you’ll be buying in at great prices for the market). What now? Whatever you want. Go ahead and burn through $36.4 million in hookers and blow if you want. You’ve got more security than 99% of the country. A lot of it is in trusts so even if you are sued your family will live well, and progress across generations. If your lawyer is worth his salt (I bet he is) then you will be insulated from most lawsuits anyhow. Buy a nice house or two, make sure they aren’t stupid investments though. Go ahead and be an angel investor and fund some startups, but REFUSE to do it for anyone you know. (Friends and money, oil and water – Michael Corleone) Play. Have fun. You earned it by putting together the shoe sizes of your whole family on one ticket and winning the jackpot.

Categories: life is fun
Follow

Get every new post delivered to your Inbox.