Dealing with the order of features (sequences)?

2018-06-24 10:57:59

Assume we have following sequence database that is subsequently converted with one-hot encoding:

1 2 3 4

0 A B C D

1 B A D NA

2 A D C NA

One-hot encoded:

A B C D

1 1 1 1

1 1 0 1

1 0 1 1

Actually, the real data has cases like co-occuring items:

1 2 3 4

0 A,B C D

1 B A,D NA

2 A D C NA

Problem:

When converting the sequential data through one-hot encoding, one key information is lost: The order (sequence) of items in the dataframe. Given that I like to make predictions based on the sequence of actions (A,B,C,D), I am puzzled how to solve this problem?

Or: Is an LSTM able to deal with this data?