Whenever we are able to control the way the data is serving we should take advantage. For example in a poll by applying sampling techniques or in medical statistics by splitting the participants into groups and treatments and so on.
We will give an example of the Orthogonal Arrays which is a part of the family of Experimental Designs.
Let’s assume that Joe sometimes suffers from stomach ache during the night. His gastroenterologist suspects that his diet is responsible for these occasional symptoms. Let’s also assume that Joe’s diet includes:
So all the possible combinations are 4 x 2 x 4 x 4 x 2 x 2 = 512. The Doctor would like to detect which food(s) may cause him this discomfort and he is planning to apply the Orthogonal Arrays. Assuming that there is no interaction in the meals, he asks Joe to follow the following diet.
Question: Which are all the possible Orthogonal Arrays from this case?
Answer: Notice that we have 3 factors of 4 levels and 3 factors of 2 levels. Using the library DoE.base we can get the list of them.
library(DoE.base)## the orthogonal arrays with 3 4-level
## factors and 3 2-level show.oas(factors = list(nlevels=c(4,2),number=c(3,3)))
And we get:
5 resolution IV or more arrays found
name nruns lineage
10 L64.2.8.4.3 64
12 L64.2.6.4.4 64
23 L128.2.20.4.3 128
26 L192.2.36.4.3 192
29 L256.2.52.4.3 256
990 orthogonal arrays found,
the first 10 are listed
name nruns lineage
17 L16.2.6.4.3 16 4~5;:(4~1!2~3;)
18 L16.2.3.4.4 16 4~5;:(4~1!2~3;)
53 L32.2.22.4.3 32 4~8;8~1;:(8~1!2~4;4~1;)(4~1!2~3;)
55 L32.2.19.4.4 32 4~8;8~1;:(8~1!2~4;4~1;)(4~1!2~3;)
57 L32.2.16.4.5 32 4~8;8~1;:(8~1!2~4;4~1;)(4~1!2~3;)
59 L32.2.15.4.3.8.1 32 4~8;8~1;:(4~1!2~3;)
60 L32.2.13.4.6 32 4~8;8~1;:(8~1!2~4;4~1;)(4~1!2~3;)
61 L32.2.12.4.4.8.1 32 4~8;8~1;:(4~1!2~3;)
62 L32.2.10.4.7 32 4~8;8~1;:(8~1!2~4;4~1;)(4~1!2~3;)
63 L32.2.9.4.5.8.1 32 4~8;8~1;:(4~1!2~3;)
From the R output, we can see that 8 runs is the minimum number of runs that we can get from this experiment. The ID code of this experiment is L16.2.6.4.3 which tells you that you can also use 6 2-level factors and 3 4-level factors.
Question: What is the recommended diet for Joe?
Answer: The doctor could ask Joe to follow the diet below for the next 16 days. Notice that 16 was the number of minimum runs that we got from that particular experimental design.
OA<-oa.design(nruns=16, factor.names=list(Breakfast=c("Sandwich","Pancakes","Omelette", "Yogurt+Honey+Nuts"), Beverage=c("Coffee","Orange Juice"),Lunch=c("Pork","Fish", "Chicken", "Salad"), Dinner=c("Pasta","Rice", "Milk+Cereals", "Pizza"),Dessert=c("Ice-Cream","Nothing"), Drink=c("Tea", "Wine")))OA
And we get:
Breakfast Beverage Lunch Dinner Dessert Drink
1 Yogurt+Honey+Nuts Coffee Fish Milk+Cereals Ice-Cream Tea
2 Yogurt+Honey+Nuts Orange Juice Salad Pizza Nothing Tea
3 Omelette Orange Juice Chicken Milk+Cereals Ice-Cream Wine
4 Pancakes Coffee Fish Rice Nothing Wine
5 Omelette Orange Juice Fish Pasta Nothing Tea
6 Pancakes Coffee Chicken Pizza Ice-Cream Tea
7 Yogurt+Honey+Nuts Orange Juice Pork Rice Ice-Cream Wine
8 Pancakes Orange Juice Salad Pasta Ice-Cream Wine
9 Sandwich Coffee Pork Pasta Ice-Cream Tea
10 Sandwich Coffee Salad Milk+Cereals Nothing Wine
11 Yogurt+Honey+Nuts Coffee Chicken Pasta Nothing Wine
12 Sandwich Orange Juice Fish Pizza Ice-Cream Wine
13 Omelette Coffee Salad Rice Ice-Cream Tea
14 Omelette Coffee Pork Pizza Nothing Wine
15 Sandwich Orange Juice Chicken Rice Nothing Tea
16 Pancakes Orange Juice Pork Milk+Cereals Nothing Tea
Every row in the table above represents one day.
1. Microsoft Azure Machine Learning x Udacity — Lesson 4 Notes
2. Fundamentals of AI, ML and Deep Learning for Product Managers
3. Roadmap to Data Science
4. Work on Artificial Intelligence Projects
A good check is to see that the factor levels are balances pairwise. Let’s take two factors for example:
aggregate(Lunch~Breakfast+Dessert, OA, length)
And we get:
Breakfast Dessert Lunch
1 Sandwich Ice-Cream 2
2 Pancakes Ice-Cream 2
3 Omelette Ice-Cream 2
4 Yogurt+Honey+Nuts Ice-Cream 2
5 Sandwich Nothing 2
6 Pancakes Nothing 2
7 Omelette Nothing 2
8 Yogurt+Honey+Nuts Nothing 2
Question: What are the next steps
Answer: Every single day, Joe should write down how was his stomach ache during the night. The range of the score can be from 0 to 10. Then the doctor would have the Xs independent variables from the Orthogonal Array and the Y dependent variable will be the score provided by Joe. Finally, he will be able to run a regression or ANOVA model to find out which variables are statistically significant.
Credit: BecomingHuman By: George Pipis