grepros

grep for ROS bag files and live topics

Index
Installation
Using the program
Matching and filtering
Outputs
Command-line options
Plugins
embag
mcap
parquet
sql
Writing your own
Changelog

API documentation
View the Project on GitHub suurjaak/grepros

Plugins

grepros supports loading custom plugins, mainly for additional output formats.

Load one or more Python modules or classes as plugins:

--plugin some.python.module some.other.module.Class

Specifying --plugin someplugin and --help will include plugin options in printed help.

There are a number of built-in plugins not loaded by default:

embag

--plugin grepros.plugins.embag

Use the embag library for reading ROS1 bags.

Significantly faster, but library tends to be unstable.

mcap

--plugin grepros.plugins.mcap

Read or write messages in MCAP format.

Requires mcap, and mcap_ros1_support or mcap_ros2_support.

In ROS2, messages grepped from MCAP files can only be published to live topics if the same message type packages are locally installed.

Write bags in MCAP format:

--plugin grepros.plugins.mcap \
--write path/to/my.mcap [format=mcap] [overwrite=true|false]
        [rollover-size=NUM] [rollover-count=NUM] [rollover-duration=NUM]
        [rollover-template=STR]

If the file already exists, a unique counter is appended to the name of the new file, e.g. my.2.mcap, unless specified to overwrite.

Specifying write format=mcap is not required if the filename ends with .mcap.

More on rollover.

parquet

--plugin grepros.plugins.parquet \
--write path/to/my.parquet [format=parquet] [overwrite=true|false] \
        [column-name=rostype:value] [type-rostype=arrowtype] \
        [idgenerator=callable] [nesting=array|all] [writer-argname=argvalue]

Write messages to Apache Parquet files (columnar storage format, version 2.6), each message type to a separate file, named path/to/package__MessageType__typehash/my.parquet for package/MessageType (typehash is message type definition MD5 hashsum).
Adds fields _topic string() and _timestamp timestamp("ns") to each type.

If a file already exists, a unique counter is appended to the name of the new file, e.g. package__MessageType__typehash/my.2.parquet, unless specified to overwrite.

Specifying format=parquet is not required if the filename ends with .parquet.

Requires pandas and pyarrow.

By default, message IDs are only added when populating nested message types, as field _id string() with UUID content. To explicitly add ID columns:

--write path/to/my.parquet idgenerator="itertools.count()"

Column type is auto-detected from produced ID values: int64/float64 for numerics, string for anything else (non-numerics cast to string).

Supports adding supplementary columns with fixed values to Parquet files:

--write path/to/my.parquet column-bag_hash=string:26dfba2c

Supports custom mapping between ROS and pyarrow types with type-rostype=arrowtype:

--write path/to/my.parquet type-time="timestamp('ns')"
--write path/to/my.parquet type-uint8[]="list(uint8())"

Time/duration types are flattened into separate integer columns secs and nsecs, unless they are mapped to pyarrow types explicitly, like:

--write path/to/my.parquet type-time="timestamp('ns')" type-duration="duration('ns')"

Supports additional arguments given to pyarrow.parquet.ParquetWriter, as:

--write path/to/my.parquet writer-argname=argvalue

For example, specifying no compression:

--write path/to/my.parquet writer-compression=null

The value is interpreted as JSON if possible, e.g. writer-use_dictionary=false.

To recursively populate nested array fields:

--write path/to/my.parquet nesting=array

E.g. for diagnostic_msgs/DiagnosticArray, this would populate files with following schemas:

diagnostic_msgs__DiagnosticArray = pyarrow.schema([
  ("header.seq",          pyarrow.int64()),
  ("header.stamp.secs",   pyarrow.int32()),
  ("header.stamp.nsecs",  pyarrow.int32()),
  ("header.frame_id",     pyarrow.string()),
  ("status",              pyarrow.string()),   # [_id from "diagnostic_msgs/DiagnosticStatus", ]
  ("_topic",              pyarrow.string()),
  ("_timestamp",          pyarrow.int64()),
  ("_id",                 pyarrow.string()),
  ("_parent_type",        pyarrow.string()),
  ("_parent_id",          pyarrow.string()),
])

diagnostic_msgs__DiagnosticStatus = pyarrow.schema([
  ("level",               pyarrow.int16()),
  ("name",                pyarrow.string()),
  ("message",             pyarrow.string()),
  ("hardware_id",         pyarrow.string()),
  ("values"",             pyarrow.string()),   # [_id from "diagnostic_msgs/KeyValue", ]
  ("_topic",              pyarrow.string()),   # _topic from "diagnostic_msgs/DiagnosticArray"
  ("_timestamp",          pyarrow.int64()),    # _timestamp from "diagnostic_msgs/DiagnosticArray"
  ("_id",                 pyarrow.string()),
  ("_parent_type",        pyarrow.string()),   # "diagnostic_msgs/DiagnosticArray"
  ("_parent_id",          pyarrow.string()),   # _id from "diagnostic_msgs/DiagnosticArray"
])

diagnostic_msgs__KeyValue = pyarrow.schema([
  ("key"                  pyarrow.string()),
  ("value",               pyarrow.string()),
  ("_topic",              pyarrow.string()),   # _topic from "diagnostic_msgs/DiagnosticStatus"
  ("_timestamp",          pyarrow.int64()),    # _timestamp from "diagnostic_msgs/DiagnosticStatus"
  ("_id",                 pyarrow.string()),
  ("_parent_type",        pyarrow.string()),   # "diagnostic_msgs/DiagnosticStatus"
  ("_parent_id",          pyarrow.string()),   # _id from "diagnostic_msgs/DiagnosticStatus"
])

Without nesting, array field values are inserted as JSON with full subtype content.

To recursively populate all nested message types:

--write path/to/my.parquet nesting=all

E.g. for diagnostic_msgs/DiagnosticArray, this would, in addition to the above, populate:

std_msgs__Header = pyarrow.schema([
  "seq",                  pyarrow.int64()),
  "stamp.secs",           pyarrow.int32()),
  "stamp.nsecs",          pyarrow.int32()),
  "frame_id",             pyarrow.string()),
  "_topic",               pyarrow.string()),   # _topic from "diagnostic_msgs/DiagnosticArray"
  "_timestamp",           pyarrow.int64()),    # _timestamp from "diagnostic_msgs/DiagnosticArray"
  "_id",                  pyarrow.string()),
  "_parent_type",         pyarrow.string()),   # "diagnostic_msgs/DiagnosticArray"
  "_parent_id",           pyarrow.string()),   # _id from "diagnostic_msgs/DiagnosticArray"
])

sql

--plugin grepros.plugins.sql \
--write path/to/my.sql [format=sql] [overwrite=true|false] \
        [nesting=array|all] [dialect=clickhouse|postgres|sqlite] \
        [dialect-file=path/to/dialects.yaml]

Write SQL schema to output file, CREATE TABLE for each message type and CREATE VIEW for each topic.

If the file already exists, a unique counter is appended to the name of the new file, e.g. my.2.sql, unless specified to overwrite.

Specifying format=sql is not required if the filename ends with .sql.

To create tables for nested array message type fields:

--write path/to/my.sql nesting=array

To create tables for all nested message types:

--write path/to/my.sql nesting=all

More on nested messages.

A specific SQL dialect can be specified (defaults to sqlite):

--write path/to/my.sql dialect=clickhouse|postgres|sqlite

Additional dialects, or updates for existing dialects, can be loaded from a YAML or JSON file:

--write path/to/my.sql dialect=mydialect dialect-file=path/to/dialects.yaml

More on SQL dialects.

Writing your own

Supported (but not required) plugin interface methods:

Plugins are free to modify grepros internals, like adding command-line arguments to grepros.main.ARGUMENTS or adding sink types to grepros.outputs.MultiSink.

Convenience methods: