Chapter 3 Data transformation

link to the data transformation file

Description:

The data sets in our group are downloaded in excel file to the folder for further analysis. With looking at all the data files of ours, here are the problems for analyzing: 1. All of these data files have sub-headers which is not eligible for data analysis using r 2. the hears name is not that clear to understand the columns 3. the rows which have some relationships with the other rows do not have specific group variables for easily using R to analyze them.

The procedure:

Our group use python to do the data transformation and data cleaning part.

Look at these data files, the actual missing values is marked with ‘-’. With this indication, the sub-headers and the separation rows are easily to be removed since the cell with nothing must be the space generated by the sub-headers or the separation row in the excel file. We could just drop these files and keep the actual missing values then deal with the missing values in different method based on the specific files.

In addition, we change the header of the columns in order to simplify the further analysis process since the current header could not be an useful indicator for using R to the analysis.

In the end, we also have add another extra column named ‘Group’ to most of the data files, which could be assigned group variables. With this action, the further analysis could be more effective, since we expect to do many group analysis and having such variable would save time to group them together.