fishbook/docs/getting_started.md

# Getting started working with fishbook

Create a project folder and place the configuration file in it (see [configuration](./configuration.md) for more information). If the configuration file contains the user credentials, you will not be prompted to enter them manually with every import.

## Package contents

Fishbook has the following submodules:

1. frontend - classes for standard usage
    * frontendclasses.py: classes reflect entries in the database such as *Dataset, Subject, Cell, RePro,* and *Stimulus*.
    * relacsclasses.py: classes that reflect Relacs repro classes such as *BaselineRecording, FileStimulus, FIcurve*.
    * util.py: util functions mainly for internal usage.
2. backend - classes for direct database interaction
    * database.py: classes reflecting the database tables, These classes use the [datajoint](https://datajoint.io) definitions to define the database tables and relations. Usually not needed for most users but helpful for advanced database interactions.

## First steps

```python
import fishbook as fb

# the following command will find and load all (!) dataset entries in the database
datasets, count = fb.Dataset.find()
print(count)

# or use
print(fb.Dataset.datasetCount())
```

### Finding what you might look for

For some classes, fetching and converting all matches to python objects of the *Dataset* class may take a while. You could then use a loop to iterate over the list of datasets and apply filters in python. This is fine, but has poor performance, better we use the underlying database for such restrictions.
Again using the *Dataset* class restrictions can be done like this:

```python
# to test your restrictions you may want to set the test==True flag
_, count = fb.Dataset.find(quality="good", min_date="2018-01-01", max_date="2018-12-31", test=True)
print(count)

# if you indeed want to fetch the results
datasets, count = fb.Dataset.find(quality="good", min_date="2018-01-01", max_date="2018-12-31")
print(len(datasets))
```

You can use the dataset class as an entry point to explore what is stored in the database about this dataset, e.g. wich cell(s) was recorded and the subject infromation.

```python
d = datasets[0]
print(d)

cells = d.cells  # get the cells recorded in this dataset
cell = cells[0]
print(cell)

s = cell.subject
print(s)

# in a given subject we may record several cells
print(len(s.cells))
```

### Which RePros were run in a given dataset?

In a dataset we may run several repros:
```python
repros = d.repro_runs()
for r in repros:
    print(r)

```
in this example this leads to an output like, this. Two **Re**search **Pro**tocols were run, this first the "BaselineActivity" repro and the second the "ReceptiveField" repro.

``` bash
RePro: BaselineActivity  id: BaselineActivity_0
run: 0   on cell: 2018-01-10-ag
start time: 0.0  duration: 40.0796

RePro: ReceptiveField    id: ReceptiveField_0
run: 0   on cell: 2018-01-10-ag
start time: 40.0796      duration: 236.42
```

### Getting information about the stimuli

The latter puts out a set of stimuli we may be interested in:

```python
r = d.repro_runs()[-1]
stimuli = r.stimuli
print(len(stimuli))  # 90 in this example case
for stim in stimuli[:10]:  # let's print the first 10
    print(stim)
```

An entry in the Stimuli table (see [ER-schema](database_layout.md) for details) is characterized by the stimulus id, the start time or start index (depends on the data source) and a stimulus duration. The stimulus also carries a textfield with the stimulus settings.

```python
stim.settings

>> 'ReceptiveField-1:\n\tReceptiveField-1:\n\t\tModality: electric\n\t\tSamplingRate: 37.59398496240602+- -1.000\n\t\tdur: 0.0+- -1.000\n\t\tampl: 0.0+- -1.000\n\t\tdeltaf: 0.0+- -1.000\n\t\tfreq: 0.0+- -1.000\n\t\tx_pos: 0.0+- -1.000\n\t\ty_pos: 0.0+- -1.000\n\t\tz_pos: 0.0+- -1.000\n\tEOD Rate: 826.6917939137279 Hz\n\tEOD Amplitude: 0.4263797848004206 mV\n\tGlobalEField: 0.0 V\n\tGlobalEFieldAM: 0.0 V\n\tLocalEField: 0.0 V\n\tI: 0.0 V\n\ttime: 294.9933 s\n\tdelay: 0.0 s\n\tintensity: 0.039810717055349734 mV/cm\n\tdur: 1.0 \n\tampl: 0.04 \n\tdeltaf: 20.0 \n\tfreq: 854.7108299738921 \n\tx_pos: 142.5 \n\ty_pos: 2.0 \n\tz_pos: -15.0 \n'

# to convert it to yaml
import yaml
settings = yaml.safe_load(stim.settings.replace("\t", ""))
print(settings)
```
The output:
```shell
 {'ReceptiveField-1': None,
 'Modality': 'electric',
 'SamplingRate': '37.59398496240602+- -1.000',
 'dur': 1.0,
 'ampl': 0.04,
 'deltaf': 20.0,
 'freq': 854.7108299738921,
 'x_pos': 142.5,
 'y_pos': 2.0,
 'z_pos': -15.0,
 'EOD Rate': '826.6917939137279 Hz',
 'EOD Amplitude': '0.4263797848004206 mV',
 'GlobalEField': '0.0 V',
 'GlobalEFieldAM': '0.0 V',
 'LocalEField': '0.0 V',
 'I': '0.0 V',
 'time': '294.9933 s',
 'delay': '0.0 s',
 'intensity': '0.039810717055349734 mV/cm'}
```

### Finding the data
The *Stimulus* in conjunction with the *Dataset* provides all infromation needed to get the data from the recorded traces.

The dataset can be located from the *Dataset* class:
```python
print(d.data_source)  # the absolute path from which the data was imported
print(d.data_host)  # the fully qaulified host name

print(d.has_nix)  # informs whether data is stored in a nix file.
```

Let's assume you have access to the dataset and it uses a [NIX](https://g-node.com/nix) file:

```python
import os
import glob
import nixio as nix

path = d.data_source
assert(os.path.exists(path))
if (d.has_nix):
    nix_file = glob.glob(path + os.sep + "*nix")[0]  # there should only be one nix file
    mtag_id = stim.multi_tag_id
    position_index = stim.index

    nf = nix.File.open(nix_file, nix.FileMode.ReadOnly)
    block = nf.blocks[0]  # there is only a single block in relacs written files
    mt = block.multi_tags[mtag_id]
    data = mt.retrieve_data(position_index, "V-1")[:]  # we are interested in the membrane voltage
    nf.close()

    print(data.shape)
```