Weka: Out of memory

Being a fan of Java, when the time came to pick my machine learning skills, I got introduced to Weka and instantly loved it. It was the first machine learning library that I could use and its rich support for Visualizations meant that
I could get a sneek peak under the hood. Weka also lets you do exploratory data analysis and run multiple algorithms on your datasets. It was a major part of my learning of Machine Learning concepts.

From time to time I hear users complain about memory problems in Weka and render it useless for any serious
machine learning problems. I would gladly agree with any researcher that means it when he/she says Weka is not scalable, I tend to disagree with novice users of Weka when they diss it for its poor memory maintenance.

Often enough, the answer to the memory problems with Weka is people not trying hard enough.
Generally with memory requirements most machine learning algorithms fall into four quadrants

Training time vs. Test time
Training time vs. Test time

Here are two ways to get the most out of Weka and one is harder than the other –

1. Increasing memory for the JVM environment

java -Xmx4g -Xms3g weka.jar

2. API – Train using the GUI, but Test via the API

Most machine learning models are trained on relatively small datasets (due to scarcity of labeled data).
However they need to be run on large datasets. If you were to use weka to do both Training and Testing using the
command below, you end up with a error as follows.

In such cases where you know you can train your algorithm well and perform cross validation etc but are unable to
test it on large datasets, consider using the API where you deploy the model on one instance at a time.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s