Overview =========================== Purpose -------------------------- Advanced techniques for a new generation of hardware will drive higher data rates for experiments at NSLS2 beamlines. The large data volumes and data acquisition rates require high-performance logging and storage systems. A typical experiment involves not only the raw data from detectors, but also requires additional data from the beamline such as experiment owner, beamline_id, energy, motor positions, wavelength, etc… To date, this information is largely held separately and manipulated individually. metadataStore is a service that is used in order to record this sort of metadata in beamline experiments. It is designed and implemented with the needs of various experiments in mind, therefore it is flexible enough to satisfy the needs of different experimental setups. metadataStore can be used independently or embedded with the dataBroker, which utilizes an integrated approach that blends different data resources together and makes them available for data analysis and visualization clients. Technology ----------------------------- metadataStore backend database is mongoDb, a No-SQL database that was chosen due to its performance and flexibility. Python(v 2.7) is chosen as language of implementation. mongoDb Python driver(pymongo) is used to perform database related operations. Downloading/Installing ------------------------------ Requirements ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Python(version 2.7.X), pymongo (version 2.6+), six, Distutils, Git Installation ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. automodule:::: ../../metadataStore/dataapi/commands.py MongoDb Installation %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% *Step 1*:: % sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10 *Step 2*:: % sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10 *Step 3*:: % sudo apt-get update *Step 4*:: % sudo apt-get install -y mongodb-org *Step 5*:: % sudo apt-get install -y mongodb-org=2.6.1 mongodb-org-server=2.6.1 mongodb-org-shell=2.6.1 mongodb-org-mongos=2.6.1mongodb-org-tools=2.6.1 metadataStore Installation %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% *Step 1*:: metadataStore is available via git repository: https://github.com/arkilic/metadataStore Clone this repository:: %git clone https://github.com/arkilic/metadataStore *Step 2*:: metadataStore includes a setup.py script. Distutils, building and installing a module distribution using the Distutils is one simple command to run from a terminal:: % python setup.py install If user does not have sudo access to the machine and/or does not want to install this package for all users:: % python setup.py install --user Getting Help ------------------------------- metadataStore can be embedded within other applications or used interactively. If used within IPython, 'help' keyword provides information regarding routines:: % from metadataStore.userapi.commands import search % help(search) % search(scan_id=None, owner=None, start_time=None, beamline_id=None, end_time=None, data=False, header_id=None, tags=None, num_header=50) % % Provides an easy way to search Header entries inserted in metadataStore % Usage: % search(scan_id=s_id) % search(scan_id=s_id, owner='ark*') % search(scan_id=s_id, start_time=datetime.datetime(2014, 4, 5)) % search(scan_id=s_id, start_time=datetime.datetime(2014, 4, 5), owner='arkilic') % search(scan_id=s_id, start_time=datetime.datetime(2014, 4, 5), owner='ark*') % search(scan_id=s_id, start_time=datetime.datetime(2014, 4, 5), owner='arkili.') Schema ------------------------------ metadataStore service consists of four collections: header, beamline_config, event_descriptor and event **A Sample Header**:: %{'status': 'In Progress', % 'beamline_id': None, % 'tags': ['CSX_Experiment1', 'CSX_Experiment2'], % 'start_time': datetime.datetime(2014, 9, 16, 13, 7, 58, 299000), % 'scan_id': 3315, 'custom': {}, % 'owner': u'arkilic', % 'end_time': datetime.datetime(2014, 9, 16, 13, 7, 58, 299000), % 'header_versions': [], % 'event_descriptors': {'event_descriptor_0': {'data_keys': ['motor5', 'motor4', 'motor3', 'list_of_1k', 'motor1', 'motor2'], % 'tag': 'experimental', 'descriptor_name': 'scan', % 'header_id': ObjectId('5418362efa44833ca9b08d08'), % 'event_type_id': 12, '_id': ObjectId('5418362efa44833ca9b08d0a'), % 'events': { % 'event_0': {'descriptor_id': ObjectId('5418362efa44833ca9b08d0a'), % 'description': None, % 'header_id': ObjectId('5418362efa44833ca9b08d08'), % 'seq_no': 3, % 'owner': 'arkilic', % '_id': ObjectId('5418362efa44833ca9b08d0c'), % 'data': {u'motor5': 36, u'motor4': 71, u'motor3': 55, u'list_of_1k': [12.3, 34.5, 45.3], u'motor1': 44, u'motor2': 35}}, % 'event_1': {'descriptor_id': ObjectId('5418362efa44833ca9b08d0a'), % 'description': None, % 'header_id': ObjectId('5418362efa44833ca9b08d08'), % 'seq_no': 1, % 'owner': u'arkilic', % '_id': ObjectId('5418362efa44833ca9b08d0b'), % 'data': {}}}, u'type_descriptor': {}}}}}}, % 'configs': {'config_0': {'header_id': ObjectId('5418362efa44833ca9b08d08'), % '_id': ObjectId('5418362efa44833ca9b08d09'), % 'config_params': {}}}, % '_id': ObjectId('5418362efa44833ca9b08d08')}} % } Header ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Each run stands for a specific set of operations performed by data collection routines. An experiment is a collection of these runs. metadataStore service does not group set of specific runs as experiments. This is done within dataBroker service(currently in development). Users/developers that use metadataStore can either use dataBroker API or develop simple applications on top of metadataStore that keep track of set of scan_id(s) to define specific experiments. Each header consists of the following keys: *_id*: (bson.ObjectId) primary key for header entry *scan_id*: user/data collection defined unique identifier describing a given run *start_time*: (datetime) run header initialization timestamp *end_time*: (datetime) run header close timestamp *tag*: (list) list of strings that assigns a user/data collection defined tag to each header *owner*: (str) data collection or system defined user info *header_versions*: (list) Keeps track of header version *beamline_id*: (str) descriptor for beamline *custom*: (dict) Field for custom information *configs*: (dict) Field that holds information regarding beamline configurations *event_descriptors*: (dict) Field that holds event definitions and events Beamline Config ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The 'configs' key inside each header holds name-value pairs and/or nested dictionaries with name-value pairs for a given run. Beamline Config Keys: *_id*: (bson.ObjectId) Unique identifier for entry in beamline_config collection. *header_id*: (bson.ObjectId) foreign key pointing back to header. *config_params*: (dict) data collection/user scripts can populate this dictionary with information of their choice. Event Descriptor ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Data related to a given run is captured inside 'events'. Event descriptors serve as event containers and define the contents of each event. In other words, event descriptor holds the actual run metadata under 'events' key. Event Descriptor Keys: *_id*: (bson.ObjectId) Unique identifier for entry in event_descriptor collection. *header_id* : (bson.ObjectId) foreign key pointing back to header. *event_type_id*: (int) event type integer descriptor generated by data collection routines. *tag*: (list) list of strings that assigns a user/data collection defined tag to each event descriptor *descriptor_name*: (str) event type string descriptor *type_descriptor*: (dict) defines fields and field data types for a given event type Event ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Metadata for each run is captured inside events. A set of events that belong to a run header can be accessed within event descriptors as described above. Data analysis/visualization tools that want to access metadata can do this by parsing run header. For the example above:: > from metadataStore.userapi.commands import search > headers = search(owner='arkilic', data=True) > > my_event_0 = headers['header_0']['event_descriptors']['event_descriptor_0']['event_0'] > >'event_0': {'descriptor_id': ObjectId('5418362efa44833ca9b08d0a'), > 'description': None, > 'header_id': ObjectId('5418362efa44833ca9b08d08'), > 'seq_no': 3, > 'owner': 'arkilic', > '_id': ObjectId('5418362efa44833ca9b08d0c'), > 'data': {u'motor5': 36, u'motor4': 71, u'motor3': 55, u'list_of_1k': [12.3, 34.5, 45.3], u'motor1': 44, u'motor2': 35}} There are also series of utility libraries that allow manipulation of query results. See utilities section for more information. Event Keys: *header_id*: (bson.ObjectId) foreign key pointing back to header *descriptor_id*: (bson.ObjectId) foreign key pointing to event descriptor *seq_no*: (int) defines an event's order in an event descriptor.(order in a set of events within descriptor) *owner*: (str) event owner *description*: (str) string field that describes an event(optional) *data*: (dict) stores the metadata as a python dictionary Tutorial ----------------------------- Start here for a quick overview Examples ----------------------------- Examples of how to perform specific tasks