On smaller projects you may find yourself, especially if manipulating domain models in memory, increasingly replicating in code many aspects that are already covered well (and robustly) by enterprise database frameworks and APIs:
- For Java (and Jakarta EE) this includes the Java Persistence API (JPA), with implementations such as EclipseLink, Hibernate, Spring Data JPA, and ObjectDB.
- For Python this includes the SQLAlchemy SQL toolkit and Object Relational Mapper (ORM), and Django ORM.
Using such a database framework comes with a cost. And when you first start a smaller project, you may feel it is not worth the extra effort. However, the advantage of indeed embracing a database framework from the very start is that you won't have to then refactor heaps of code if you do decide later that you need a fully-fledged database framework after all.
Some "tells" that you may need to upgrade to use a fully-fledged database framework include:
- You find yourself using low-level SQL commands (and even if via an SQL library that protects against SQL injection).
- You find yourself performing "wiring" gymnastics for relationships between "pseudo entity classes" that may be better handled as actual database entities (with automatic synchronisation of changes to database).
- You find yourself coding the equivalent of queries (as opposed to having a query method that is an entry point to an actual query engine).
Those from a Java background may be more familiar with the term entity. In the JPA, the @Entity
annotation is used to indicate a Plain Old Java Object (POJO), with additional annotations (which can be on both the class or the fields of the class) for indicating attributes of relationships and database storage options.
Those from a Python background may be more familiar with the SQLAlchemy terminology object model, or in Django ORM usually just model. Python does not have a direct equivalent of Java @annotations. Instead, SQLAlchemy uses Mapped
, mapped_column
, and relationship
etc. to achieve the equivalent, and Django ORM uses models.Model
, models.ForeignKey
etc.
Independent of the language, it's the need for that "extra database mapping stuff" that can be discouraging and might make you hesitate to upgrade a simpler application to using a full database-mapping framework. But with that "extra stuff" comes a heap of very useful goodies, especially:
- Built-in facility for easily adding and removing entities from collections with automated synchronisation with the database (in JPA via an
EntityManager
, in SQLAlchemy via aSession
). - A well defined query-language that enables you to work in terms of objects and relationships instead of tables.
- Automated wiring of related objects on load from database.
- Options for efficient in-memory use (such as a choice of lazy loading relationship fields) .
- SQL (if using a relational database) is all generated and executed for you and is injection safe.
- The best IDEs such as IntelliJ IDEA or PyCharm have built-in support or plugins for help guide you when creating your entity model (aka object model) and managing that "extra stuff" for the database mapping.
For Java, the Java Persistence API (JPA) offers a very stable target - with the possible complication that if you are using Hibernate, which is JPA compliant, JPQL queries will still work, but JPQL is a only subset of the Hibernate Query Language (HQL). Overwhelmingly (mostly), your entity model coding effort is guaranteed to be future-proofed and not bound to a particular implementation.
With Python, there are far more choices, and heated debates about which ORM if any is the "de facto industry standard" (none of them are). If you choose a particular object model and ORM technology, you are far more committed to it. This might (and sometimes perhaps rightly) cause some developers to hesitate to use any of the ORM technologies (if you don't 100% need a database), but surely picking one is better than reinventing wheels in your own code.
The takeaway: There are few compelling cases for not using an actual database and entity (or object model) mapping framework from the very start of all but the simplest projects, even if at first the project only uses a simple file-based database such as SQLite (and even if concurrent use is not required). And doing so helps you work "out of your entity or object domain model" to create a cleaner architecture, with more reuse and less re-inventing of wheels.
TIP: For Java and Jakarta EE consider ObjectDB
If you have a smaller application (or even a larger full enterprise web application) and hesitate to use JPA because you don't want to have to deal with an additional database install such as MariaDB, have a look at ObjectDB, which is a super fast JPA-compliant pure object database. It's just a single JAR accompanied by some configuration options, backed by a simple database file. It can also be used easily in simpler Java applications (need not be a full Jakarta EE web application).
TIP: For Python have a look at SQLModel (combines SQLAlchemy and Pydantic)
CAVEAT: Best suited to simpler "flat" data structures not highly complex object models with many relationships.
SQLModel was developed by the same team as FastAPI. It combines Pydantic and SQLAlchemy, endeavours to minimise replication of code between your Pydantic model and your SQLAlchemy object model, and is of course compatible with FastAPI, so is worth a look if you are using FastAPI for REST API development with Pydantic validation.
TIP: For Python: HDF5 and Pandas: Primarily suited to large datasets
If you don't need a fully-blown entity (aka object model) database mapping framework, and you want something better than just using spreadsheets, have a look at the file-based HDF5, which Pandas can read easily as DataFrames. You can organise datasets in groups and use a /path/to/resource
syntax to access data sets. Data sets can be stored in a single HDF5 file (so it has low maintenance overhead and backing up is easy).
HDF5 is weak in the querying area, but the Python PyTables project makes it easier to manipulate HDF5 data tables and array objects and heavily leverages NumPy. Or you can always just load entire data sets as DataFrames and use your Pandas skills.