API Documentation#
panama (re-exported from submodules)#
- panama.read_DAT(files: Path | str | list[Path] | None = None, glob: str | None = None, max_events: int | None = None, run_header_features: list[str] | None = None, event_header_features: list[str] | None = None, additional_columns: bool = True, mother_columns: bool = False, drop_mothers: bool = True, drop_non_particles: bool = True, noparse: bool = True) tuple[DataFrame, DataFrame, DataFrame]#
Read CORSIKA DAT files to Pandas.DataFrame. Exactly one of files or glob must be provided. Made for CORSIKA>7.4, other compatibility not garantueed, but probably approximate. All energies and masses are given in \(\mathrm{GeV}\), while lifetimes are given in \(\mathrm{ns}\). All other units follow the CORSIKA7 definitions, look at its userguide.
- Parameters:
files (Path or List of Paths) – Single or list of DAT files to read into the dataframe. They must all have unique run_numbers and event_numbers. If None glob must be provided
glob – Globbing expression like path/to/corsika/output/DAT*. If None, files must be provided.
max_events (int | None) – Maximum number of events to read in. If None, read in everything.
run_header_features (tuple or None) – Names of the run header to actually save, corresponding to the naming of pycorsikaio. If None uses a default list. (default: None)
event_header_features (tuple or None) – Names of the event header to actually save, corresponding to the naming of pycorsikaio. If None uses a default list. (default: None)
additional_columns (bool) –
Weather to add (and calculate) additional columns not present in standard corsika output, to make life easier. They take minimal time to calculate. These include:
corsikaid
hadron_gen
n_obs_level
is_mother
pdgid
mass
energy
zenith
mother_columns (bool) – Weather to add columns related to the mother/grandmother output of the EHIST option. They take more time to calculate, since the rows are dependent of each other.
drop_mothers (bool) – Weather to remove all mother rows (default: True)
drop_non_particles (bool) – Weather to remove all rows not representing a real particle (like muon additional information) (default: True)
noparse – Use the “noparse” feature of pycorsikaio, which theoretically makes reading in the corsika files faster
- Returns:
- run_header: pandas.DataFrame
DataFrame with the information about each run
- event_header: pandas.DataFrame
DataFrame with the information about each event
- particles: pandas.DataFrame
DataFrame with the information about each particle
- Return type:
A tuple (run_header, event_header, particles)
- panama.get_weights(df_run: ~pandas.core.frame.DataFrame, df_event: ~pandas.core.frame.DataFrame, df: ~pandas.core.frame.DataFrame, model: ~fluxcomp.cosmic_ray_fluxes.CosmicRayFlux = <fluxcomp.cosmic_ray_fluxes.H3a object>, proton_only: bool = False, groups: dict[~particle.pdgid.pdgid.PDGID, tuple[int, int]] | None = None) DataFrame#
Returns a DataFrame with the correct weight for a given primary model The DataFrame will be indexed by the run and event index, so it can be assigned as a column to the particle DataFrame df.
The primary energy can have different energy-regions, but they must not overlap, if they do, an error is raised.
- Parameters:
df_run (The run dataframe (as returned by panama.read_DAT))
df_event (The event dataframe (as returned by panama.read_DAT))
df (The particle dataframe (as returned by panama.read_DAT))
model (The Cosmic Ray primary flux model (instance of CRFlux from the FluxComp package))
proton_only (If set to true (default is false), only proton pdgid weights are non-zero and refer to) – all-nucleon flux.
groups (Mapping from the primary PDGID in the monte carlo, to an (inclusive) range of elements (represented by their atomic number) which will we summed up in the) – model to represent this element in MC. (values: Tuple[Zmin, Zmax])
- Returns:
weights (A dataframe with the weights labeled by the run and event index.)
Can be used like this (df[‘weights’] = panama.get_weights(df_run, df_event, df))
- panama.add_weight_prompt(df: DataFrame, prompt_factor: float, weight_col_name: str = 'weight_prompt', is_prompt_col_name: str = 'is_prompt') None#
Adds column “weight_prompt” to df, to set a weight for every prompt particle, non prompt particles get weight 1.
- Parameters:
df (The particle dataframe (as returned by panama.read_DAT))
prompt_factor (The number to put in the weight_prompt column.)
weight_col_name (The column name to give for the prompt weight column (default 'weight_prompt').)
is_prompt_col_name (The name of the column which indicates the promptness of a particle.)
- panama.add_weight_prompt_per_event(df: DataFrame, prompt_factor: float, weight_col_name: str = 'weight_prompt_per_event', is_prompt_col_name: str = 'is_prompt') None#
Adds column “weight_prompt_per_event” to df, which will be prompt_factor for every particle, which is inside a shower, which has at least one prompt muon. For every other particle, it will be 1.
- Parameters:
df (The particle dataframe (as returned by panama.read_DAT))
prompt_factor (The number to put in the weight_prompt column.)
weight_col_name (The column name to give for the prompt weight column (default 'weight_prompt_per_event').)
is_prompt_col_name (The name of the column which indicates the promptness of a particle (default: 'is_prompt').)
- class panama.CorsikaRunner(primary: dict[int, int], n_jobs: int, template_path: Path, output: Path, corsika_executable: Path, corsika_tmp_dir: Path, seed: None | int = None, save_std: bool = False, first_run_number: int = 0, first_event_number: int = 1)#
This class manages running multiple CORSIKA7 processes in parallel, by splitting up the requested showers in badges and changing the initial seeds for CORSIKA7 for each batch. It also provides a progressbar by investigating the stdout from CORSIKA7. To automatically clean up temporary directories, this class can be used in a with-statement. Otherwise, call corsika_runner.clean() to delete the tmp dirs.
- __init__(primary: dict[int, int], n_jobs: int, template_path: Path, output: Path, corsika_executable: Path, corsika_tmp_dir: Path, seed: None | int = None, save_std: bool = False, first_run_number: int = 0, first_event_number: int = 1) None#
This class manages running multiple CORSIKA7 processes in parallel, by splitting up the requested showers in badges and changing the initial seeds for CORSIKA7 for each batch. It also provides a progressbar by investigating the stdout from CORSIKA7. This means that “parallelization” is handled by the operating system. If you are only allowed to use one core, this will not parallelize anything.
- Parameters:
primary (dict[int, int]) – Mapping from PDGID to number of events with this primary. 10 Proton and 20 Helium-4 air showers would mean {2212: 10, 1000020040: 20}. (Use the proton pdgid, not the Hydrogen-1 pdgid!) Conversion between pdgid and Corsika7ID is handled by the particle python package. All primaries of one type are processed parallel, and the different primaries are processed after each other. This guarantees, that each progress running parallel at a time will approximately run an equal amount of time.
n_jobs (int) – The number of parallel jobs to send to the operating system.
template_path (Path) – The path to the template of the CORISKA7 card.
output (Path) – The path where the CORSIKA7 process will produce the output.
corsika_executable (Path) – The path to the CORSIKA7 executable.
corsika_tmp_dir (Path) – A temporary directory to symlink the CORSIKA7 executable to. Since CORSIKA7 can not be run in parallel from the same executable directly. The copied/symlinked files will be deleted automatically when used in a context manager (with-statement), otherwise you have to call the clean() method. The directory itself will not be deleted, only the used subdir in the directory.
seed (None | int, optional) – The seed to use for generating the seeds for the CORSIKA7 program. If None is given, entropic source of the computer will be used.
save_std (bool, optional) – Whether or not to save the standard output of the CORSIKA7 programs. If true, the output is available as “prim{pdgid}_job{jobid}.log” in the output folder.
first_run_number (int = 0, optional) – The run number the first run will get. All following runs will increment the run number by one.
first_event_number (int = 1, optional) – The event number the first event in each run will get.
- Raises:
ValueError – If the input is not consistent.
- clean() None#
Deletes the temporary directories, this is called when the object is deleted. This method has to be called, before a different CorsikaRunner with the same tmp_dir can be constructed. The object can’t be used anymore after calling this method.
- run(disable_pb: bool = False) None#
Start all the processes and wait for them to finish. Each primary element is run after another.
- Parameters:
disable_pb (bool) – If True, disables the (tqdm) progressbar.
panama.prompt#
Functions to use to determine if a particle is prompt or not, using multiple different definitions.
- panama.prompt.is_prompt_lifetime_limit(df_particles: DataFrame, lifetime_limit_ns: float = 0.00410356581640216) _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]#
Return a numpy array of prompt labels for the input dataframe differentiating it by the lifetime of the mother particle.
- Parameters:
df_particles (dataframe with the corsika particles, additional_columns have to be present when running read_DAT)
lifetime_limit_ns – The lifetime limit in nanoseconds above which a particle is considered conventional.
- Return type:
A numpy boolean array, True for prompt, False for conventional
- panama.prompt.add_cleaned_mother_cols(df_particles: DataFrame) None#
Adds mother_lifetime_cleaned, mother_mass_cleaned and mother_energy_cleaned if not present in the dataframe.
- panama.prompt.is_prompt_lifetime_limit_cleaned(df_particles: DataFrame, lifetime_limit_ns: float = 0.00410356581640216) _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]#
Return a numpy array of prompt labels for the input dataframe differentiating it by lifetime of the mother particle. It considers the cleaned particle type of the mother.
- Parameters:
df_particles (dataframe with the corsika particles, additional_columns have to be present when running read_DAT)
- Return type:
A numpy boolean array, True for prompt, False for conventional
- panama.prompt.is_prompt_energy(df_particles: DataFrame, s: float = 2) _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]#
- Return a numpy array of prompt labels for the input dataframe differentiating it by energy of the mother particle,
with considering the cleaned particle type of the mother.
- Parameters:
df_particles (dataframe with the corsika particles, additional_columns have to be present when running read_DAT)
s (scaling factor. How much bigger does the decay length has to be compared to the interaction length)
- Return type:
A numpy boolean array, True for prompt, False for conventional
- panama.prompt.is_abs_id_not_in(df_particles: DataFrame, pdgids: list[int], pdgid_col: str) _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]#
Return a numpy array which is true if abs(pdgid_col) is not in pdgids, false otherwise.
- Parameters:
df_particles (DataFrame) – dataframe with the corsika particles, additional_columns have to be present when running read_DAT
pdgids (list[int]) – list of ints with the pdgids to check
pdgid_col (str) – column to check if abs value is not equal to any value in pdgids
- Return type:
A numpy boolean array, True if abs value of col is not in pdgids, False otherwise
- panama.prompt.is_prompt_pion_kaon(df_particles: DataFrame) _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]#
Return a numpy array of prompt labels for the input dataframe differentiating it by the pdgid (cleaned) of the mother particle. If the mother is a pion or a kaon it is not prompt, otherwise it is.
- Parameters:
df_particles (dataframe with the corsika particles, additional_columns have to be present when running read_DAT)
pion_kaon_pdgids
- Return type:
A numpy boolean array, True for prompt, False for conventional
- panama.prompt.is_prompt_pion_kaon_wrong_pdgid(df_particles: DataFrame) _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]#
Return a numpy array of prompt labels for the input dataframe differentiating it by the pdgid (uncleaned) of the mother particle. If the mother is a pion or a kaon it is not prompt, otherwise it is.
- Parameters:
df_particles (dataframe with the corsika particles, additional_columns have to be present when running read_DAT)
- Return type:
A numpy boolean array, True for prompt, False for conventional
- panama.prompt.is_prompt_pion_kaon_grandmother(df_particles: DataFrame) _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]#
Return a numpy array of prompt labels for the input dataframe differentiating it by the pdgid (cleaned) of the mother particle. If the mother is a pion or a kaon it is not prompt, otherwise it is.
- Parameters:
df_particles (dataframe with the corsika particles, additional_columns have to be present when running read_DAT)
- Return type:
A numpy boolean array, True for prompt, False for conventional
- panama.prompt.is_prompt_energy_wrong_pdgid(df_particles: DataFrame, s: float = 2) _Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes]#
Return a numpy array of prompt labels for the input dataframe differentiating it by energy of the mother particle (uncleaned).
- Parameters:
df_particles (dataframe with the corsika particles, additional_columns have to be present when running read_DAT)
s (scaling factor. How much bigger does the decay length has to be compared to the interaction length)
- Return type:
A numpy boolean array, True for prompt, False for conventional
panama.read#
Functions concerning input of CORSIKA7 DAT files.
- panama.read.read_DAT(files: Path | str | list[Path] | None = None, glob: str | None = None, max_events: int | None = None, run_header_features: list[str] | None = None, event_header_features: list[str] | None = None, additional_columns: bool = True, mother_columns: bool = False, drop_mothers: bool = True, drop_non_particles: bool = True, noparse: bool = True) tuple[DataFrame, DataFrame, DataFrame]#
Read CORSIKA DAT files to Pandas.DataFrame. Exactly one of files or glob must be provided. Made for CORSIKA>7.4, other compatibility not garantueed, but probably approximate. All energies and masses are given in \(\mathrm{GeV}\), while lifetimes are given in \(\mathrm{ns}\). All other units follow the CORSIKA7 definitions, look at its userguide.
- Parameters:
files (Path or List of Paths) – Single or list of DAT files to read into the dataframe. They must all have unique run_numbers and event_numbers. If None glob must be provided
glob – Globbing expression like path/to/corsika/output/DAT*. If None, files must be provided.
max_events (int | None) – Maximum number of events to read in. If None, read in everything.
run_header_features (tuple or None) – Names of the run header to actually save, corresponding to the naming of pycorsikaio. If None uses a default list. (default: None)
event_header_features (tuple or None) – Names of the event header to actually save, corresponding to the naming of pycorsikaio. If None uses a default list. (default: None)
additional_columns (bool) –
Weather to add (and calculate) additional columns not present in standard corsika output, to make life easier. They take minimal time to calculate. These include:
corsikaid
hadron_gen
n_obs_level
is_mother
pdgid
mass
energy
zenith
mother_columns (bool) – Weather to add columns related to the mother/grandmother output of the EHIST option. They take more time to calculate, since the rows are dependent of each other.
drop_mothers (bool) – Weather to remove all mother rows (default: True)
drop_non_particles (bool) – Weather to remove all rows not representing a real particle (like muon additional information) (default: True)
noparse – Use the “noparse” feature of pycorsikaio, which theoretically makes reading in the corsika files faster
- Returns:
- run_header: pandas.DataFrame
DataFrame with the information about each run
- event_header: pandas.DataFrame
DataFrame with the information about each event
- particles: pandas.DataFrame
DataFrame with the information about each particle
- Return type:
A tuple (run_header, event_header, particles)
- panama.read.add_mother_columns(df_particles: DataFrame, pdgids: list[int] | None = None) None#
Adds the information from mother and grandmother rows to the column of the daughter particle.
This looks so complicated, since in the table different rows depend on each other. To do this in a numpy-friendly way is not that trivial. (We do not want to iterate through the rows -> python loops) So this is done via a shifted index array.
- Parameters:
df_particles (DataFrame) – the particle dataframe with additional columns from read_DAT
pdgids (list[int] | None) – The unique pdgids in the dataframe. If none, they are calculated.
panama.run#
Classes handling the parallel execution of CORSIKA7 processes.
- class panama.run.CorsikaJob(corsika_executable: Path, corsika_copy_dir: Path, card_template: str)#
This class handles the execution and monitoring of one single CORSIKA7 process. Usually, there should be no need to use this class directly, use CorsikaRunner instead.
- __init__(corsika_executable: Path, corsika_copy_dir: Path, card_template: str) None#
- Parameters:
corsika_executable (Path) – Path of the CORSIKA7 executable.
corsika_copy_dir (Path) – The path to where the original executable will be symlinked, so it can be run multiple times in parallel. CORSIKA7 for some reason does not allow running the same executable multiple times.
card_template (str) – The string containing a valid CORSIKA7 run card with additional python-like templates (e.g. {emin}). The template will be formatted when calling start.
- clean() None#
Cleans the temporary directory.
- property is_finished: bool#
Returns True if the process is not running, False otherwise
- start(corsika_config: dict[str, str], save_std: Path | None = None) None#
Starts the CORSIKA7 process with the given parameters, if it is not already running.
- Parameters:
corsika_config (dict[str, str]) – The template values which will be filled in the template corsika card.
save_std (Path | None, optional) – If provided, the std output of CORSIKA7 will be saved to this path.
- Raises:
RuntimeError – If the process is already running.
- poll() int | None#
Returns how many showers finished since last poll or None if the process is finished.
- Returns:
n_update
- Return type:
The number of showers finished since last poll or None if process is finished
- join() int#
Waits for the CORSIKA7 process to finish, if it is running.
- Returns:
n_update
- Return type:
The number of finished events in the last output.
- Raises:
RuntimeError – If the process is already finished.
- class panama.run.CorsikaRunner(primary: dict[int, int], n_jobs: int, template_path: Path, output: Path, corsika_executable: Path, corsika_tmp_dir: Path, seed: None | int = None, save_std: bool = False, first_run_number: int = 0, first_event_number: int = 1)#
This class manages running multiple CORSIKA7 processes in parallel, by splitting up the requested showers in badges and changing the initial seeds for CORSIKA7 for each batch. It also provides a progressbar by investigating the stdout from CORSIKA7. To automatically clean up temporary directories, this class can be used in a with-statement. Otherwise, call corsika_runner.clean() to delete the tmp dirs.
- __init__(primary: dict[int, int], n_jobs: int, template_path: Path, output: Path, corsika_executable: Path, corsika_tmp_dir: Path, seed: None | int = None, save_std: bool = False, first_run_number: int = 0, first_event_number: int = 1) None#
This class manages running multiple CORSIKA7 processes in parallel, by splitting up the requested showers in badges and changing the initial seeds for CORSIKA7 for each batch. It also provides a progressbar by investigating the stdout from CORSIKA7. This means that “parallelization” is handled by the operating system. If you are only allowed to use one core, this will not parallelize anything.
- Parameters:
primary (dict[int, int]) – Mapping from PDGID to number of events with this primary. 10 Proton and 20 Helium-4 air showers would mean {2212: 10, 1000020040: 20}. (Use the proton pdgid, not the Hydrogen-1 pdgid!) Conversion between pdgid and Corsika7ID is handled by the particle python package. All primaries of one type are processed parallel, and the different primaries are processed after each other. This guarantees, that each progress running parallel at a time will approximately run an equal amount of time.
n_jobs (int) – The number of parallel jobs to send to the operating system.
template_path (Path) – The path to the template of the CORISKA7 card.
output (Path) – The path where the CORSIKA7 process will produce the output.
corsika_executable (Path) – The path to the CORSIKA7 executable.
corsika_tmp_dir (Path) – A temporary directory to symlink the CORSIKA7 executable to. Since CORSIKA7 can not be run in parallel from the same executable directly. The copied/symlinked files will be deleted automatically when used in a context manager (with-statement), otherwise you have to call the clean() method. The directory itself will not be deleted, only the used subdir in the directory.
seed (None | int, optional) – The seed to use for generating the seeds for the CORSIKA7 program. If None is given, entropic source of the computer will be used.
save_std (bool, optional) – Whether or not to save the standard output of the CORSIKA7 programs. If true, the output is available as “prim{pdgid}_job{jobid}.log” in the output folder.
first_run_number (int = 0, optional) – The run number the first run will get. All following runs will increment the run number by one.
first_event_number (int = 1, optional) – The event number the first event in each run will get.
- Raises:
ValueError – If the input is not consistent.
- clean() None#
Deletes the temporary directories, this is called when the object is deleted. This method has to be called, before a different CorsikaRunner with the same tmp_dir can be constructed. The object can’t be used anymore after calling this method.
- run(disable_pb: bool = False) None#
Start all the processes and wait for them to finish. Each primary element is run after another.
- Parameters:
disable_pb (bool) – If True, disables the (tqdm) progressbar.
panama.weights#
Functions to add weights to a CORSIKA dataframe read in by read_DAT. If using the suggested flux definitions from fluxcomp, all fluxes are given in units of \((\mathrm{m^2}\ \mathrm{s}\ \mathrm{sr}\ \mathrm{GeV})^{-1}\).
- panama.weights.get_weights(df_run: ~pandas.core.frame.DataFrame, df_event: ~pandas.core.frame.DataFrame, df: ~pandas.core.frame.DataFrame, model: ~fluxcomp.cosmic_ray_fluxes.CosmicRayFlux = <fluxcomp.cosmic_ray_fluxes.H3a object>, proton_only: bool = False, groups: dict[~particle.pdgid.pdgid.PDGID, tuple[int, int]] | None = None) DataFrame#
Returns a DataFrame with the correct weight for a given primary model The DataFrame will be indexed by the run and event index, so it can be assigned as a column to the particle DataFrame df.
The primary energy can have different energy-regions, but they must not overlap, if they do, an error is raised.
- Parameters:
df_run (The run dataframe (as returned by panama.read_DAT))
df_event (The event dataframe (as returned by panama.read_DAT))
df (The particle dataframe (as returned by panama.read_DAT))
model (The Cosmic Ray primary flux model (instance of CRFlux from the FluxComp package))
proton_only (If set to true (default is false), only proton pdgid weights are non-zero and refer to) – all-nucleon flux.
groups (Mapping from the primary PDGID in the monte carlo, to an (inclusive) range of elements (represented by their atomic number) which will we summed up in the) – model to represent this element in MC. (values: Tuple[Zmin, Zmax])
- Returns:
weights (A dataframe with the weights labeled by the run and event index.)
Can be used like this (df[‘weights’] = panama.get_weights(df_run, df_event, df))
- panama.weights.add_weight_prompt(df: DataFrame, prompt_factor: float, weight_col_name: str = 'weight_prompt', is_prompt_col_name: str = 'is_prompt') None#
Adds column “weight_prompt” to df, to set a weight for every prompt particle, non prompt particles get weight 1.
- Parameters:
df (The particle dataframe (as returned by panama.read_DAT))
prompt_factor (The number to put in the weight_prompt column.)
weight_col_name (The column name to give for the prompt weight column (default 'weight_prompt').)
is_prompt_col_name (The name of the column which indicates the promptness of a particle.)
- panama.weights.add_weight_prompt_per_event(df: DataFrame, prompt_factor: float, weight_col_name: str = 'weight_prompt_per_event', is_prompt_col_name: str = 'is_prompt') None#
Adds column “weight_prompt_per_event” to df, which will be prompt_factor for every particle, which is inside a shower, which has at least one prompt muon. For every other particle, it will be 1.
- Parameters:
df (The particle dataframe (as returned by panama.read_DAT))
prompt_factor (The number to put in the weight_prompt column.)
weight_col_name (The column name to give for the prompt weight column (default 'weight_prompt_per_event').)
is_prompt_col_name (The name of the column which indicates the promptness of a particle (default: 'is_prompt').)