It is nice to be able to avoid typing something over and over - like getting pt
into units of GeV
in ATLAS, or defining a new computed column in your data model. There are two methods for doing this in the prototype:
- Creating a new column in the data model using
pandas
like notation. - Defining a regular expression that can lead to a new column.
import sys
sys.path = ['..'] + sys.path
from hep_tables import xaod_table,make_local
from func_adl import EventDataset
import matplotlib.pyplot as plt
plt.rcParams.update({'figure.figsize':(7,5), 'figure.dpi':100})
dataset = EventDataset('localds://mc15_13TeV:mc15_13TeV.361106.PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee.merge.DAOD_STDM3.e3601_s2576_s2132_r6630_r6264_p2363_tid05630052_00')
df = xaod_table(dataset)
df['all_ele'] = df.Electrons("Electrons")
df.all_ele['mypt'] = df.all_ele.pt / 1000.0
We can then use them as if we had the leaves or collections as reulgar parts of the event:
%%time
pts = make_local(df.all_ele.mypt)
plt.hist(pts.flatten(), range=(0, 100), bins=50)
plt.xlabel('Electron $p_T$ [GeV]')
_ = plt.ylabel('PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee')
As we will see in the next section, a computed column can also be a lambda expression.
Defining and redefining leaves
One thing about ATLAS is that all object $p_T$'s are in units of MeV - electrons, jets muons, tracks, etc. It would be nice to redefine all references to pt
to be in GeV. So lets define a new leaf that will always return pt
in units of GeV no matter where it is used in the data model.
from dataframe_expressions import define_alias
define_alias('', 'ptgev', lambda o: o.pt / 1000.0)
%%time
pts = make_local(df.all_ele.ptgev)
plt.hist(pts.flatten(), range=(0, 100), bins=50)
plt.xlabel('Electron $p_T$ [GeV]')
_ = plt.ylabel('PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee')
The arguments to define_alias
actually allow quite a bit more (abusive) power. The first is the root, the second is the new leaf name, and the third is the function to be applied.
define_alias('.', 'eles', lambda e: e.Electrons("Electrons"))
%%time
pts = make_local(df.eles.ptgev)
plt.hist(pts.flatten(), range=(0, 100), bins=50)
plt.xlabel('Electron $p_T$ [GeV]')
_ = plt.ylabel('PowhegPythia8EvtGen_AZNLOCTEQ6L1_Zee')
The help for define_alias
gives you a few more detials if interested.
help(define_alias)