Inserting into an unordered_set with custom hash function

I think, Andy Prowl perfectly fixed the problems with your code. However, I would add the following member function to your Interval, which describes what makes two intervals identical:

std::string getID() const { return std::to_string(b) + " " + std::to_string(e) + " " + std::to_string(proteinIndex); }

Please note, that I also followed Andy Prowl's suggestion and renamed the members begin to b and end to e. Next, you can easily define the hash and comparison functions by using lambda expressions. As a result, you can define your unordered_set as follows:

auto hash = [](const Interval& i){ return std::hash<std::string>()(i.getID()); };
auto equal = [](const Interval& i1, const Interval& i2){ return i1.getID() == i2.getID(); };
std::unordered_set<Interval, decltype(hash), decltype(equal)> test(8, hash, equal);

Finally, for reasons of readability, I converted your for loop into a range-based for loop:

std::list<Interval> concat {{1, 2, false, 3, 4}, {2, 3, false, 4, 5}, {1, 2, true, 7, 4}};

for (auto const &i : concat)
    test.insert(i);

for (auto const &i : test)
    std::cout << i.b << ", " << i.e << ", " << i.updated << std::endl;

Output (I just printed first three members of each Interval):

2, 3, 0
1, 2, 0

As you can see, there are only two intervals printed. The third one ({1, 2, true, 7, 4}) was not inserted to test, because its b, e, and proteinIndexare equal to that of the first interval ({1, 2, false, 3, 4}).

Code on Ideone


First problem:

You are passing string as the second template argument for your instantiation of the unordered_set<> class template. The second argument should be the type of your hasher functor, and std::string is not a callable object.

Perhaps meant to write:

unordered_set<Interval, /* string */ Hash> test;
//                      ^^^^^^^^^^^^
//                      Why this?

Also, I would suggest using names other than begin and end for your (member) variables, since those are names of algorithms of the C++ Standard Library.

Second problem:

You should keep in mind, that the hasher function should be qualified as const, so your functor should be:

struct Hash {
   size_t operator() (const Interval &interval) const {
   //                                           ^^^^^
   //                                           Don't forget this!
     string temp = to_string(interval.b) + 
                   to_string(interval.e) + 
                   to_string(interval.proteinIndex);
     return (temp.length());
   }
};

Third problem:

Finally, if you want std::unordered_set to be able to work with objects of type Interval, you need to define an equality operator consistent with your hash function. By default, if you do not specify any type argument as the third parameter of the std::unordered_set class template, operator == will be used.

You currently do not have any overload of operator == for your class Interval, so you should provide one. For example:

inline bool operator == (Interval const& lhs, Interval const& rhs)
{
    return (lhs.b == rhs.b) && 
           (lhs.e == rhs.e) && 
           (lhs.proteinIndex == rhs.proteinIndex); 
}

Conclusion:

After all the above modifications, your code becomes:

#include <string>
#include <unordered_set>
#include <list>

using namespace std;

struct Interval {
  unsigned int b;
  unsigned int e;
  bool updated;   //true if concat.  initially false
  int patternIndex;  //pattern index. valid for single pattern
  int proteinIndex;   //protein index.  for retrieving the pattern
};

bool operator == (Interval const& lhs, Interval const& rhs)
{
    return (lhs.b == rhs.b) && (lhs.e == rhs.e) && (lhs.proteinIndex == rhs.proteinIndex); 
}

struct Hash {
   size_t operator()(const Interval &interval) const {
     string temp = to_string(interval.b) + to_string(interval.e) + to_string(interval.proteinIndex);
     return (temp.length());
   }
};

int main()
{
   unordered_set<Interval, Hash> test;
  
  list<Interval> concat;
  for(list<Interval>::iterator i = concat.begin(); i != concat.end(); ++i){
    test.insert(*i);
  }

}