Shawn Shan's Paper Reading Everyday

11 Dec 2016

API Design Ebook

- https://pages.apigee.com/rs/apigee/images/api-design-ebook-2012-03.pdf
- Brian Mulloy

- Keep your base URL simple and intuitive
- Keep verbs out of your base URLs
- Use HTTP verbs to operate on the collections and elements. (POST, GET, PUT, and DELETE -- CRUD)
- Plural nouns and concrete names (Being consistent)
- Simplify associations - sweep complexity under the ‘?’
	○ GET /dogs?color=red&state=running&location=park
- Errors
	○ Errors become a key tool providing context and visibility into how to use an API.
	○ Use HTTP status codes
	○ Start by using the following 3 codes. If you need more, add them. But you shouldn't need to go beyond 8. 
		§ • 200 - OK 
		§ • 400 - Bad Request 
		§ • 500 - Internal Server Error
	○ Make messages returned in the payload as verbose as possible.
- Tips for versioning
	○ Never release an API without a version. 
	○ Make the version mandatory. Specify the version with a 'v' prefix. Move it all the way to the left in the URL so that it has the highest scope (e.g. /v1/dogs). 
	○ Use a simple ordinal number. Don't use the dot notation like v1.2 because it implies a granularity of versioning that doesn't work well with APIs--it's an interface not an implementation. Stick with v1, v2, and so on
	○ Maintain at least one version back.
	○ Should version and format be in URLs or headers?
		§ If it changes the logic you write to handle the response, put it in the URL so you can see it easily. 
		§ If it doesn't change the logic for each response, like OAuth information, put it in the header.
- Pagination and partial response
	○ Support partial response by adding optional fields in a comma delimited list. 
		§ /dogs?fields=name,color,location
	○ Use limit and offset to make it easy for developers to paginate objects.
		§ /dogs?limit=25&offset=50
- What about responses that don’t involve resources?
	○ when might not deal with a resource,
	○ Use verbs not nouns
		§ Calculate, Translate, Convert
		§ /convert?from=EUR&to=CNY&amount=100
- Supporting multiple formats
	○ /dogs/1234.json
	○ JSON is a good default format
- What about attribute names?
	○ Use JSON as default 
	○ Follow JavaScript conventions for naming attributes 
		§  Use medial capitalization (aka CamelCase) 
		§  Use uppercase or lowercase depending on type of object
- Tips for search
	○ using verbs not nouns when results don't return a resource from the database - rather the result is some action or calculation
		§ Global search: /search?q=fluffy+fur
		§ Scoped search: /owners/5678/dogs?q=fluffy+fur
		§ Formatted results: /search.xml?q=fluffy+fur
- Consolidate all API requests under one API subdomain.
	○ api.teachdogrest.com
	○ developers.yourtopleveldomain
		§ Web request, redirect from api.. to developers.. // developer... to developers...
- For specific scenrios, one may supress error code (example. Adobe)
	○ /public_timelines.json? suppress_response_codes=true 
		§ HTTP status code: 200 {"error":"Could not authenticate you."}
	○ Overall recommendations: 
		§ Use suppress_response_codes = true 
		§ The HTTP code is no longer just for the code
		§ Push any response code that we would have put in the HTTP response down into the response message
- API façade pattern
	○ Design the ideal API – design the URLs, request parameters and responses, payloads, headers, query parameters, and so on. The API design should be self-consistent. 
	○ Implement the design with data stubs. This allows application developers to use your API and give you feedback even before your API is connected to internal systems. 
	○ Mediate or integrate between the façade and the systems.

08 Dec 2016

Computing machinery and intelligence

- COMPUTING MACHINERY AND INTELLIGENCE
- A. M. Turing
- https://www.csee.umbc.edu/courses/471/papers/turing.pdf
- The Imitation Game
	○ Three players, A, B, C
		§ C wants to tell the gender of A,B by teletypers 
		§ A can tell lies, B wants to help C win
	○ The reader must accept it as a fact that digital computers can be constructed, and indeed have been constructed, according to the principles we have described, and that they can in fact mimic the actions of a human computer very closely. 
	○ Even when we consider the actual physical machines instead of the idealised machines, reasonably accurate knowledge of the state at one moment yields reasonably accurate knowledge any number of steps later. 
	○ This special property of digital computers, that they can mimic any discrete-state machine, is described by saying that they are universal machines. The existence of machines with this property has the important consequence that, considerations of speed apart, it is unnecessary to design various new machines to do various computing processes. They can all be done with one digital computer, suitably programmed for each case. It 'ill be seen that as a consequence of this all digital computers are in a sense equivalent. 
	○ Likewise according to this view the only way to know that a man thinks is to be that particular man. 
	○ Processes that are learnt do not produce a hundred per cent certainty of result; if they did they could not be unlearnt. 
	○ We can only see a short distance ahead, but we can see plenty there that needs to be done.

06 Dec 2016

Tiny Data

- https://github.com/papers-we-love/papers-we-love/blob/master/data_science/tidy_data.pdf
- Hadley Wickham
- Data Structure
	○ A dataset is a collection of values, usually either numbers or strings
	○ Every value belongs to a variable and an ovservation. 
		§ A variable contains all values that measure the same underlying attribute (height, duration..)
		§ An observation contains all values measured on the same unit (person, day..) across attributes
- In tidy data: 
	○  Each variable forms a column.
	○  Each observation forms a row.
	○  Each type of observational unit forms a table. 
- Example:
	year artist time track date week rank
	2000 2 Pac 4:22 Baby Don’t Cry 2000-02-26 1 87
	2000 2 Pac 4:22 Baby Don’t Cry 2000-03-04 2 82
	2000 2 Pac 4:22 Baby Don’t Cry 2000-03-11 3 72
	2000 2 Pac 4:22 Baby Don’t Cry 2000-03-18 4 77
	2000 2 Pac 4:22 Baby Don’t Cry 2000-03-25 5 87
	2000 2 Pac 4:22 Baby Don’t Cry 2000-04-01 6 94
	2000 2 Pac 4:22 Baby Don’t Cry 2000-04-08 7 99
	2000 2Ge+her 3:15 The Hardest Part Of ... 2000-09-02 1 91
	2000 2Ge+her 3:15 The Hardest Part Of ... 2000-09-09 2 87
	2000 2Ge+her 3:15 The Hardest Part Of ... 2000-09-16 3 92
	2000 3 Doors Down 3:53 Kryptonite 2000-04-08 1 81
	2000 3 Doors Down 3:53 Kryptonite 2000-04-15 2 70
	2000 3 Doors Down 3:53 Kryptonite 2000-04-22 3 68
	2000 3 Doors Down 3:53 Kryptonite 2000-04-29 4 67
	2000 3 Doors Down 3:53 Kryptonite 2000-05-06 5 66

03 Dec 2016

Large-Scale Internet Services

The paper is selected from http://pages.cs.wisc.edu/~remzi/Classes/739/Fall2016/.

On Designing and Deploying Internet-Scale Services - James Hamilton – Windows Live Services Platform

1. Three tenets 
	a. Expect failutres. 
		i. Failures may cause depedent components to fail. 
	b. Keep things simple. 
		i. Simple things are more easily to get it right. 
		ii. Avoid unnecessay dependencies. 
		iii. Simple installation.
		iv. Failure isolation. One server failure has no impact on other data centers. 
	c. Automate everything. 
		i. People make mistakes. 
2.  Deploy an operations-freindly service
	a. Overall Application Design
		i. system fails --> look first to operations
		ii. simplicity is the key to efficient operations
			1) Design for failure. The entire service must be capable of surviving failure without human administrative interation. To test the failure path --> just hard-fail it. 
			2) Redundancy and fault recovery
				a) is the operations team willing and ble to bring down any server in the service ant any time without draining the work load first? 
				b) Security threat modeling
					i) each possible security threat and implement enough mitigation for each
				c) Document all conceivable component failures modes and combinations. 
					i) make sure that the service and continue to operate witout unacceptable loss in service quality.
					ii) Rare combinations of errors can become commonplace. 
			3) Commodity hardware slice
				a) large clusters of commodity servers $ << small # of large servers 
				b) I/O is the constrain. Server performance continues to increase much faster than I/O performance -> a small server, more balanced system for the given amout of disk
				c) power consumption scales linearly with servers but cubically with clock frequency --> Higher performace server $$$
				d) small server failure --> small overall service workload
			4) Single-version software
				a) target a single internal deployment
				b) previous versions don't have to be supported for a decade
					i) The most economic services don't give customers control over the version they run and only one host version. 
						One. Few UE changes
						Two. willingness to allow customers that need this level of control to either host internally or switch to an application service provier 
			5) Mullti-tenancy
				a) hosting all companies or end users of a service in the same service without physical isolation
				b) Single tenancy: segregation of groups of users in an isolated cluster
			6) Quick service health check
				a) services version of a build verification test
				b) ensure that services isn't broken in any substantive way
			7) Develop in the full environment
				a) unit testing components, and full servce with their component changes 
			8) zero trust of underlying components
				a) ssume that underlying components will fail 
			9) understand access patterns 
				a) "What impacts will this feature have on the rest of the infrastructure"
				b) measure and validate the feature for load when live
			10) Version everything
				a) Expect a mixed version environment
				b) run single version software but multiple versions will be live for production and test
			11) Keep the unit/funcctional tests from the last release
				a) Keep n-1 version tests
			12) Avoid single points of faulture
				a) Prefer stateless implementations. Don't affinitize requests. Static allocation is bad (example, hashing)
				b) Use Fine-grained partitioning (where related individual tuples (e.g., cliques of friends) are co-located together in the same partition) and don't support cross-partition operations to allow efficient scaling across many database servers. 
	b. Automatic Management and Provisioning
		i. it can be hard because of human judgement needed sometimes (depedency)
		ii. Be restartable and redundant
			1) persistent state stored redundantl
		iii. Support geo-distribution
			1) support running across several hosing data center. 
		iv. Automatic provisioning and installation
		v. Configuration and code as a unit
			1) code and configuratoin as a single unit
			2) operations deploys them as a unit
			3) services should treat confi and code as a unit
			4) audit log is required if confi change must be made in production
		vi. Manage server roles or personalities rather than servers 
		vii. Multi-system failures are common
		viii. Recover at the service level 
			1) handle failures and correct errors at the service level with full context rather than in lower software levels 
		ix. Never rely on local storage for non-recoverable information 
			1) duplicate all the non-ephemeral service state 
		x. keep deployment simple 
			1) file copy, mi external dependencies. 
		xi. fail services regularly
			1) unwilling?
	c. Dependency Management
		i. Expect latency. calls to external components may talke long to complete. 
			1) set timeout
			2) operational idempotency allows the restart of the requests after timeout even though those requests may have partially or even fully completed. 
			3) ensure all starts are reported and bond reestarts to avoid a repeatedly failing request 
		ii. Isolate failures 
			1) avoid cascading failures 
		iii. Use shipping and proven components 
			1) stable version of software and hardware 
		iv. Implement inter-service monitoring and alerting 
			1) need to know when a dependent service is overloading 
		v. Dependent services require the same deisng point 
		vi. Decouple components 
			1) ensure that components can continue operation perhaps in a degraded mode during failureso f other components. For example, maintain a session key and refresh it every N hours 
	d. Release Cycle and Testing 
		i. Invest in engineering
			1) Services that don't think big to start with will be scrambling to catch up later 
		ii. Support version roll-back
		iii. Maintain forward and backward compatiblity
			1) Changing between components are all potential risk. Don't rip out support for old file formats until there is no chance of a roll back to that old format in the future
		iv. Single-server deployment
			1) The entire service must be easy to host on a single system --> for unit testing 
		v. Stress test for load
		vi. Perform capacity and performance testing prior to new releases 
			1) do at service level .
		vii. Build and deploy shallowly and iteratively
			1) get a skeletion version of the full service at the early stage
		viii. test with real data
		ix. Run system-level acceptance test
			1) sanity check
		x. test and develop in full environments
			1) use the same data collection and mining techniques used in production 
3. Graceful Degradation and Admission Control
	a. A big red switch. 
		i. a designed and tested action that can be taken when the service is no longer able to meet its SLA (?).
		ii.  keepp the vital processing progressing while shedding or delaying some non-critical workload. 
		iii. Determine what is minimally required if the sytem is in trouble and implementing and testing the option to shut of the non-essential services when that happens
	b. Control admission
		i. if the current load cannot be processed on the system, more work load --> even more and bad user experience. example: email, stop queuing -> not accept more mails into the system 
		ii. Service premium customers over non-premium customers 
	c. Meter admission. 
		i. modification of the admission control point
		ii. be able to bring the system back up slowly. Ramp up. 1 user, 10 users, 100 users.
		iii. Ways to notify users

The conclusion: Spending time engineering the system at the beginning is worth it.

02 Dec 2016
Life should be more meaningful than watching hilarious Youtube videos

When I woke up this morning at 11am, I read an article that most successful businessman woke up at 4 am each morning. OMG. That must be kidding. Coders will not get to sleep until 2 am each day.

After that, I am reflecting on what should I do for my last 6 months of college with barely nothing to do. Lying on the bed, thinking about nothing and watching youtube videos are a good way to waste time. I tried it for the past week, but I cannot stand it anymore because it is so meaningless to waste time this way. You know, those videos are not really hilarious if I think through them carefully afterwards. Awkward.

Should I do something more interesting than repeatedly opening youtube? Maybe I should.

How about reading a computer science paper each day?

Seriously? I won’t insist as I know myself. I re-opened the youtube channel and watched another recommended video. HAHAHAHAHAHA. Wait. It’s not funny. I feel so shame on myself, once again.

Let me just go back here and read the papers! To motivate myself, I will post a paper review/summary once a day. It’s going to be awesome and I know it.

Here we go.

API Design Ebook

Computing machinery and intelligence

Tiny Data

Large-Scale Internet Services

Life should be more meaningful than watching hilarious Youtube videos