You are doing research and planning only for the next two projects:

A variation of the script identifying ".exe" files in project 2 -- this will be Project 3:

    • Read from the command line: the directory to examine, and the type of file to look for ("EXE" was what we used last week)
    • Retrieve a listing of the files in the directory specified on the command line argument
    • Print the files that match the requested type
You will need the "os" and the "sys" modules - read the documentation for these modules and determine which functions you need to be able to complete the project.

Plan Project 4:

You have two data sources, i.e. text files that contain one piece data per line
  • In IT: think about multiple log files, each file contains information about hosts identified by IP address
  • In science/tech fields: datasets from multiple sources, where the same data may appear in more than one dataset
  • In each case, one field in each data record is the "key" that identifies the data
  • The project 4 page contains two sample files with data from the OU curriculum process.
The goal of the project will be to write a script that
  • Opens multiple data files
  • Reads the files and
    • Merges the data into a single output file
    • Removes duplicate data
    • The output file needs to be sorted by the data key
Figure out a way to write down the steps needed to implement this script in "meta-code", i.e. describe what the script will do without writing python code. You are focusing on how to solve the problem at hand, without worrying yet how to make python do what you need.