2016-08-19

2016年夏、ナウい、クライアントJSエコシステム

JS react

クライアントJSのエコシステム

react + flux + react-router + material-ui + axios + ES6 + 
babel + webpack + ESLint + Airbnb JavaScript Style Guide

Flux Utils

facebook社内でreactjsを運用する中で見えてきたベストプラクティスをライブラリ化したもの。

Flux | Application Architecture for Building User Interfaces

Hot Reloading

開発中のファイル変更を検知して、よしなにリロードしてくれるあれ。react-hot-loaderがアツそう。

webpack, React Hot Loader + Browsersync でクロスブラウジング+ホットリロード開発 - TOEIC940点の文系大学生プログラマによるブログ

Asynchronous ReactJS Component loading

なんでもbundle.js一本化すると、巨大なアプリの場合初期ロードで10MBとか引っ張ってくることになるのでユーザ激おこだよね、という話。どう対応するのが良いか。

Code Splitting for React Router with ES6 Imports - Modus Create

Implicit Code Splitting and Chunk Loading with React Router and Webpack | Henley Edition

2016-08-11

自己嫌悪中。誠実さが足りていないという話

「誠実さが長所だ。」などと思い込んでいたが、実際全く誠実さが足りていないと痛感している。

頭で考えていること（思っていること）と、実際の言行が度々矛盾する

ことがあるな、と。反省している。特に酔っている時、冷静でない時、余裕が無い時にこそ誠実でありたい。

2016-08-07

深層学習。ディープラーニング。特徴表現学習。基礎とか書籍学ぶのに良さそうなものまとめ

深層学習 TensorFlow Keras

基礎イメージ、理論を知りたい人向け。自分の学習がてら。

Qiita

特にプログラマーでもデータサイエンティストでもないけど、Tensorflowを1ヶ月触ったので超分かりやすく解説 - Qiita

深層生成モデルのtensorflowによる実装(Importance Weighted Autoencoders) - Qiita

ディープラーニングで顔写真から巨乳かどうかを判別してみる (うまくいったか微妙) - Qiita

SlideShare

深層学習（青本）まとめ http://www.slideshare.net/kenkurisu1/clipboards/clipboad3

その他

vaaaaaanquish.hatenablog.com

tjo.hatenablog.com

kivantium.hateblo.jp

書籍

2016-05-24

シャーディング時のデータ分散、ID採番戦略

MySQL

データをどう分散するかに加えて、ID採番も考える必要がある

データ分散の戦略

fixed mapping
1. id % shard で求めるやつ。shard数が変わると計算結果が変わるのがネック
dynamic mapping
1. user_to_shard のようなマッピングテーブルをつくる。shardキーのカーディナリティが100億とかになるとそもそもマッピングテーブル自体がでかくなり過ぎてつらいのがネック
mixed mapping
1. 上記の欠点に対して、ハッシュ関数などを噛ませて取りうる値の範囲を絞ってからマッピングを行うもの
explicit mapping
1. instagramやpinterestが採用している方法。idの中にシャード番号も織り込む。欠点はシャードキーにそのidを使えない制約がある場合。

ID採番の戦略

採番テーブルを使う
1. MySQLそのもののauto_increment
2. RedisなどInMemoryDBで一元管理
UUID
1. MySQLのUUID_SHORT()など
2. オリジナルUUID（instagramやpinterest）

感想

データ分散をexplicitにした場合、必然的にID採番はオリジナルUUIDになる
mixed mappingは割りと柔軟性があると思う。しかし、ID採番をどうするか、またhash to shardのLOOKUPテーブルをどうやって生成してどこに保持するか。など考えるべき事項が多い。

2016-05-24

シャーディング時のクエリ発行

MySQL

use db1;
select * from table1;

上記でもいいが、複数コネクション張っている場合、1番目と2番目の操作の間に他のコマンドがはいるかも

create table test.personal(id int, name varchar(20));
insert into test.personal (id,name) VALUES (11, "abc");
select * from test.personal;
update test.personal set name = "bbb";
delete from test.personal ;

ということで基本、dot連結が良いと思った。

2016-05-18

MySQL Scaling with Sharding

MySQL

Sharding Pinterest: How we scaled our MySQL fleet - pinterest enginnering

8台の物理サーバに1つずつMySQL インスタンス。それぞれのインスタンスは master - master replicated onto a backup host in case the primary fails

use master, not slave...

Our production servers only read/write to the master. I recommend you do the same. It simplifies everything and avoids lagged replication bugs.

各MySQL インスタンスは512個位ずつdatabaseを持つ。

shard

We made a design decision that once a piece of data lands in a shard, it never moves outside that shard. However, you can get more capacity by moving shards to other machines (we’ll discuss this later).

[{“range”:     (0,511), “master”: “MySQL001A”, “slave”: “MySQL001B”},
 {“range”: (512, 1023), “master”: “MySQL002A”, “slave”: “MySQL002B”},
    ...
 {“range”: (3584, 4095), “master”: “MySQL008A”, “slave”: “MySQL008B”}]

Each shard contains the same set of tables: pins, boards, users_has_pins, users_likes_pins, pin_liked_by_user, etc. I’ll expand on that in a moment.

how do we distribute our data to these shards

We created a 64 bit ID that contains the shard ID, the type of the containing data, and where this data is in the table (local ID). The shard ID is 16 bits, type ID is 10 bits and local ID is 36 bits.

question

例えばuserID % shardの値によってshardを決定している場合、shard数を4096 --> 8192に増加させた場合計算結果が変わるよね？つまり、データが格納されるべきshardが変わってしまう。つまり、どうやってpinterestではどのようにshardを決定しているのか。

→　最初に決定したら固定。IDの中に「どのシャードに保存されているか」という情報も含んでいるためImmutableにできる。これ真似しようと思ったら、UserID（というかshard key）に必ずシャード番号を含ませるとか？

固定の場合、何も考えずにmod計算などでINSERTしていくと古いシャードほどデータ量が多くなるよね？現在のシャード負荷状況を考慮して格納シャードを決定しないかぎり

→　新しくシャードを追加した後、新たにINSERTするレコードは新しいシャードの中のみで分配するようなロジックをアプリケーション側で頑張って実装しているのかな。。。？

→　完全ランダムっぽい？ ( > New users are randomly distributed across shards. )

The Mod Shard

Scaling Pinterest - From 0 To 10s Of Billions Of Page Views A Month In Two Years

ID Structure

64 bits: shard ID: 16 bits type : 10 bits - Pin, Board, User, or any other object type local ID - rest of the bits for the ID within the table. Uses MySQL auto increment.

Enough shards IDs for 65536 shards, but they only opened 4096 at first, they’ll expand horizontally. When the user database gets full they’ll open up more shards and allow new users to go to the new shards.

New users are randomly distributed across shards.

ということは、別に古いシャードに割り振られても気にしないということか。単純にシャードが一杯一杯になってきたら、シャードごと別のDBサーバに移してしまう。みたいな。

All data (pins, boards, etc) for a user is collocated on the same shard. Huge advantage. Rendering a user profile, for example, does not take multiple cross shard queries. It’s fast

Objects And Mappings

Queries are primary key or index lookups (no joins).

Data doesn’t move across database as it does with clustering. Once a user lands on shard 20, for example, and all the user data is collocated, it will never move. The 64 bit ID has contains the shard ID so it can’t be moved. You can move the physical data to another database, but it’s still associated with the same shard.

つまり、一旦データが配置されたらシャード番号はそれ以降決して変更されない。物理的に別のDBサーバに移動されることはあっても。

Database Sharding

Database Sharding, The “Shared-Nothing” Approach

2016-05-18

GlobalDBの意義

MySQL

DBシャーディングの際に、シャーディングしない GlobalDB の必要性を考えてみた。

auto_increment したい場合。
- 例えば foo テーブルの id (auto_increment) が bar baz など別テーブルの shard_key になるケース。この場合fooテーブル自体はシャーディングできない。分割するとauto_incrementが無理。最悪、それでも分割したい場合はauto_incrementではなくUUIDをidとして振って、そのハッシュ値をshard_keyとするとか。
- ただし本当に bar baz のshard_keyにauto_increment値を使わなきゃいけないのかは別途考えよう。例えば、foo に shard カラムみたいなのを作って、INSERT時にランダムな値をそのカラムに突っ込むだけである程度分散可能ではないか？もう少しちゃんとやるならbar_to_shardのようなシャードマッピングテーブルを作るとか。それで済むのであればfooのidに依存しないためfooは分割可能である。
UNIQUE 制約を使いたい場合
- 例えばユーザ情報など。user_id, name, address, e-mail などのカラムがあった場合、user_idでもe-mailでもUNIQUE制約を付けたいとか。そんな時に分割してしまうと、全DBシャードを舐めないといけないので辛い。