Merge branch 'structure'

2024-10-18 14:08:09 +02:00 · 2024-10-18 14:08:09 +02:00 · 602d9bb422
commit 602d9bb422
parent 9ab39d61c3 092ea67f40
2 changed files with 304 additions and 0 deletions
--- a/code/README.md
+++ b/code/README.md
@ -0,0 +1,301 @@
+# Writing a Good Python Script: A Primer
+
+This primer will guide you through best practices to write effective and clean
+Python scripts. Whether you're working on a data processing pipeline, a machine
+learning model, or a simple utility script, following these guidelines will
+help you create maintainable and readable code.
+
+## 1. Use a Declarative and Meaningful Script Name
+
+Choose a script name that clearly describes its purpose. This makes it easier
+for others (and yourself) to understand what the script does without reading
+the code.
+
+**Examples:**
+
+- `data_cleaning.py` instead of `script1.py`
+- `generate_report.py` instead of `run.py`
+
+## 2. Start with a Short Explanation (Docstring)
+
+At the beginning of your script, include a docstring that briefly explains what the script does. This helps users quickly grasp the script's functionality.
+
+```python
+"""
+This script loads raw data, cleans it by removing null values and duplicates,
+and saves the processed data to a new file.
+"""
+```
+
+## 3. Import All Required Packages at the Beginning
+
+List all your imports at the top of the script. This makes dependencies clear and simplifies maintenance.
+
+```python
+import sys                # Packages that are provided by Python
+from pathlib import Path
+import numpy as np        # Packages that are downloaded, specified in the requierements.txt
+import pandas as pd
+import my_module          # Modules that are written by yourself
+```
+
+## 4. Encapsulate Code in Functions and Classes
+
+Organize your code by wrapping functionality within functions or classes. This
+promotes code reuse, testing, and readability. Ideally, functions should do one
+thing and do it well. Classes can be used for more complex logic or when you need
+to maintain state. Clean functions and classes contain type hints and docstrings
+to explain their purpose and inputs/outputs.
+
+**Examples of Functions:**
+
+```python
+def load_data(file_path: str) -> pd.DataFrame:
+    """Loads data from a CSV file.
+
+    Parameters:
+    ----------
+    file_path : str
+        Path to the CSV file.
+
+    Returns:
+    -------
+    pd.DataFrame
+        Loaded data as a DataFrame.
+    """
+    return pd.read_csv(file_path)
+
+def clean_data(df: pd.DataFrame) -> pd.DataFrame:
+    """Cleans the DataFrame by removing null values and duplicates.
+
+    Parameters:
+    ----------
+    df : pd.DataFrame
+        Input DataFrame.
+          
+    Returns:
+    -------
+    pd.DataFrame
+        Cleaned DataFrame.
+    """
+    df = df.dropna()
+    df = df.drop_duplicates()
+    return df
+
+def save_data(df: pd.DataFrame, output_path: str) -> None:
+    """Saves the DataFrame to a CSV file.
+
+    Parameters:
+    ----------
+    df : pd.DataFrame
+        DataFrame to save.
+    output_path : str
+        Path to save the CSV file.
+    """
+    df.to_csv(output_path, index=False)
+```
+
+**Example of a Class:**
+
+```python
+class DataProcessor:
+    """A class for processing data."""
+
+    def __init__(self, file_path):
+        self.data = self.load_data(file_path)
+
+    def load_data(self, file_path):
+        return pd.read_csv(file_path)
+
+    def clean_data(self):
+        self.data.dropna(inplace=True)
+        self.data.drop_duplicates(inplace=True)
+
+    def save_data(self, output_path):
+        self.data.to_csv(output_path, index=False)
+```
+
+## 5. Define a `main()` Function
+
+Create a `main()` function that serves as the entry point of your script. This
+function should orchestrate the flow of your program.
+
+```python
+def main():
+    """Main function that orchestrates the data processing."""
+    input_file = 'data/raw/data.csv'
+    output_file = 'data/processed/clean_data.csv'
+
+    # Using functions
+    data = load_data(input_file)
+    clean_data = clean_data(data)
+    save_data(clean_data, output_file)
+
+    # Or using a class
+    # processor = DataProcessor(input_file)
+    # processor.clean_data()
+    # processor.save_data(output_file)
+
+    print("Data processing complete.")
+```
+
+## 6. Use the `if __name__ == "__main__":` Statement
+
+This is a common Python idiom that allows you to check if the script is being
+run as the main program. This ensures that the `main()` function is only called
+when the script is executed directly. If you execute the `main()` function
+directly, it will be executed when the module, or just parts of it, are
+imported in another script.
+
+So at the end of your script, add:
+
+```python
+if __name__ == "__main__":
+    main()
+```
+
+This checks if the script is being run as the main program and calls `main()` accordingly.
+
+## Putting It All Together
+
+Here's how your script might look when you combine all these best practices:
+
+```python
+"""
+This script loads raw data, cleans it by removing null values and duplicates, and saves the processed data to a new file.
+"""
+
+import os
+import sys
+import pandas as pd
+import numpy as np
+
+def load_data(file_path: str) -> pd.DataFrame:
+    """Loads data from a CSV file.
+
+    Parameters:
+    ----------
+    file_path : str
+        Path to the CSV file.
+
+    Returns:
+    -------
+    pd.DataFrame
+        Loaded data as a DataFrame.
+    """
+    return pd.read_csv(file_path)
+
+def clean_data(df: pd.DataFrame) -> pd.DataFrame:
+    """Cleans the DataFrame by removing null values and duplicates.
+
+    Parameters:
+    ----------
+    df : pd.DataFrame
+        Input DataFrame.
+          
+    Returns:
+    -------
+    pd.DataFrame
+        Cleaned DataFrame.
+    """
+    df = df.dropna()
+    df = df.drop_duplicates()
+    return df
+
+def save_data(df: pd.DataFrame, output_path: str) -> None:
+    """Saves the DataFrame to a CSV file.
+
+    Parameters:
+    ----------
+    df : pd.DataFrame
+        DataFrame to save.
+    output_path : str
+        Path to save the CSV file.
+    """
+    df.to_csv(output_path, index=False)
+
+def main():
+    """Main function that orchestrates the data processing."""
+    input_file = 'data/raw/data.csv'
+    output_file = 'data/processed/clean_data.csv'
+
+    data = load_data(input_file)
+    clean_data = clean_data(data)
+    save_data(clean_data, output_file)
+
+    print("Data processing complete.")
+
+if __name__ == "__main__":
+    main()
+```
+
+## Additional Tips
+
+- **Comment Your Code:** Use comments to explain non-obvious parts of your code. However, strive to write code that is self-explanatory.
+- **Follow PEP 8 Guidelines:** Adhere to the [PEP 8](https://www.python.org/dev/peps/pep-0008/) style guide for Python code to improve readability. To make this easy, use an auto-formatter like `black` or `ruff`.
+- **Use Meaningful Variable, Function and ClasE Names:** Choose names that convey their purpose. Avoid single-letter variable names except for simple iterators. Instead of `x` and `y` use e.g., `time` and `signal`.
+- **Handle Exceptions:** Use try-except blocks to handle potential errors gracefully.
+  
+  ```python
+  try:
+      data = load_data(input_file)
+  except FileNotFoundError:
+      print(f"Error: The file {input_file} was not found.")
+      sys.exit(1)
+  ```
+
+- **Use Logging Instead of Print Statements:** For larger scripts, consider using the `logging` module for better control over logging levels and outputs.
+  
+  ```python
+  import logging
+
+  logging.basicConfig(level=logging.INFO)
+
+  logging.info("Data processing complete.")
+  ```
+
+- **Parameterize Your Scripts:** Use command-line arguments or a configuration file to make your script more flexible.
+  
+  ```python
+  import argparse
+
+  def parse_arguments():
+      parser = argparse.ArgumentParser(description="Process and clean data.")
+      parser.add_argument('--input', required=True, help='Input file path')
+      parser.add_argument('--output', required=True, help='Output file path')
+      return parser.parse_args()
+
+  def main():
+      args = parse_arguments()
+      data = load_data(args.input)
+      clean_data = clean_data(data)
+      save_data(clean_data, args.output)
+  ```
+
+  - **Make Your Code Modular:** Break down your script into multiple files or
+  modules for better organization and reusability. For example, move data
+  processing functions that are used in multiple scripts to a separate module
+  called `data_processing.py`.
+  
+  - **Coding a figure:** If you are coding a figure, you can follow our [coding
+  a figure
+  guide](https://github.com/bendalab/plottools/blob/master/docs/guide.md).
+  Applying the same principles to your figure code will make it easier to
+  modify and reuse.
+  
+
+## Conclusion
+
+By following these best practices, you'll create Python scripts that are:
+
+- **Readable:** Clear structure and naming make your code easy to understand.
+- **Maintainable:** Encapsulation and modularity simplify updates and debugging.
+- **Reusable:** Functions and classes can be imported and used in other scripts.
+- **Robust:** Error handling ensures your script can handle unexpected situations gracefully.
+
+Remember, good coding practices not only make your life easier but also help
+others who may work with your code in the future. The effort you put into
+writing clean and effective scripts will pay off in the long run.
+
+Happy coding!
+
--- a/data/README.md
+++ b/data/README.md
@ -0,0 +1,3 @@
+# Tips and Tricks for the data directory
+
+The filename should contain information about the date